Subject: Re: lib/36938: mbtowc misbehaving after invalid char sequence
To: None <gnats-bugs@NetBSD.org>
From: Neil Booth <neil@daikokuya.co.uk>
List: netbsd-bugs
Date: 11/17/2007 21:36:09
Takehiko NOZAKI wrote:-

>  hi, Neil.
>  
>   current src/lib/libc/citrus/modules/citrus_utf8.c
>  (and other multibye encoding modules) implementation:
>  
>    219	/* make sure we have the first byte in the buffer */
>    220	if (psenc->chlen == 0) {
>    221		if (n-- < 1)
>    222			goto restart;
>    223		psenc->ch[psenc->chlen++] = *s0++;
>    224	}
>    225
>    226	c = _UTF8_count[psenc->ch[0] & 0xff];
>    227	if (c < 1 || c < psenc->chlen)
>    228		goto ilseq;
>  
>   - read first 1-byte into internal-state(line 223).
>   - check it whether valid character or not(line 226-227).
>  
>  so that internal-state always become ``none-initial'' state.
>  
>   OTOH many mbtowc(3) implementations,
>  (AFAIK glibc2, Solaris, FreeBSD, MSVC++6) seems that:
>  
>   - check first 1-byte is valid character or not(if invalid, return -1).
>   - store it into internal-state for restart.
>  
>  so that internal-state remains ``initial'' state.
>  
>  but ``How to store internal-state with pieces of multibyte sequence''
>  is implementation defined behavior, because SUSv3's documentation
>  doesn't mention about it(correct me if i'm wrong).
>  
>  http://opengroup.org/onlinepubs/007908799/xsh/mbtowc.html
>  
>  #  in case of mbrtowc(3) and mbstate_t,
>  # "the conversion state is undefined" when return value is (size_t)-1.
>  # 
>  # http://opengroup.org/onlinepubs/007908799/xsh/mbrtowc.html
>  # http://opengroup.org/onlinepubs/007908799/xsh/wchar.h.html
>  
>   so that, whether current locale is stateless or stateful,
>  you can not omit to re-initialize internal state of mbtowc(3) by #if 0'ed,
>  i think.
>  
>  
>  ...but we are minority, we might change behavior in the future.
>  
>  very truly yours.

Thanks for the detailed explanation; you might be right that
the current behaviour is conforming.  I've asked on comp.std.c;
let's see what the response is.

Neil.