Subject: Re: utf-8 and userland
To: Dave Huang <email@example.com>
From: None <firstname.lastname@example.org>
Date: 03/14/2004 13:30:19
>On Sat, Mar 13, 2004 at 06:03:00PM -0500, James K. Lowden wrote:
>> Last I heard, the ANSI definition of "multibyte character" for mbtowc(3)
>> was something other than UTF-8. How does mbtowc(3) know its input is
>> UTF-8? And what is its output then, UCS-2?
>that "The behaviour of this function is affected by the LC_CTYPE
>category of the current locale." That's how it tells... if LC_CTYPE is
>en_US.UTF-8, mbtowc converts from UTF-8. If it's zh_TW.Big5, it
>converts from Big5.
>The output is a wide character, which is an implementation-defined
>type. I don't know exactly what NetBSD's libc uses for wide
>characters, but it looks to me like UCS-4. However, the Citrus
>Project's web page at http://citrus.bsdclub.org/ mentions that, "...
>design contraints of the class 'Encoding must be ISO 2022' or
>'Encoding must be UCS4' are not acceptible." I don't know if that has
>any bearing on whether wchar_t is a UCS-4 character or not :)
wchar_t has to be handled as opaque data; i.e. you should not assume
certain encoding. if you need some tests iswprint() and such are