tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wchar_t encoding?



Paul Koning <Paul_Koning%dell.com@localhost> wrote:
>> > ...
>> > The trouble for NetBSD is that it asks iconv to translate to a
>> character
>> > set named "wchar_t".  That means "whatever the encoding is for the
>> > wchar_t data type".  GNU libiconv supports that, so on platforms
> that
>> > use that library things are fine.
> 
> I did some digging to see how libiconv implements that feature.
> 
> If  __LIBC_ISO_10646__ is defined then it simply aliases this to an
> appropriate width Unicode (ucs2 or ucs4).  That applies to Linux, for
> example.
> 
> If it isn't defined (as is the case on NetBSD) but mbrtowc() exists,
> then it uses that function.  More precisely, a conversion to "wchar_t"
> first converts to Unicode, which is then fed into mbrtowc to produce the
> wchar_t encoding.  mbrtowc knows about any locale issues...
>
> I guess that means that "multibyte" is Unicode, or UTF-8???  I don't see
> that documented in any manpage.  It also means that if you have a source
> character that's not in Unicode but is in whatever encoding wchar_t
> uses, it would not be handled by the libiconv implementation of iconv()
> because it uses Unicode as an intermediate form.

Yeah, this fallback seems bogus.  mbtowc &co exepct the source to be
in the current charset, so it's wrong to feed it unicode data (even if
wchar_t *is* always unicode internally).

SY, Uwe
-- 
uwe%stderr.spb.ru@localhost                       |       Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/          |       Ist zu Grunde gehen



Home | Main Index | Thread Index | Old Index