tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RE: wchar_t encoding?



> > ...
> > The trouble for NetBSD is that it asks iconv to translate to a
> character
> > set named "wchar_t".  That means "whatever the encoding is for the
> > wchar_t data type".  GNU libiconv supports that, so on platforms
that
> > use that library things are fine.

I did some digging to see how libiconv implements that feature.

If  __LIBC_ISO_10646__ is defined then it simply aliases this to an
appropriate width Unicode (ucs2 or ucs4).  That applies to Linux, for
example.

If it isn't defined (as is the case on NetBSD) but mbrtowc() exists,
then it uses that function.  More precisely, a conversion to "wchar_t"
first converts to Unicode, which is then fed into mbrtowc to produce the
wchar_t encoding.  mbrtowc knows about any locale issues...

I guess that means that "multibyte" is Unicode, or UTF-8???  I don't see
that documented in any manpage.  It also means that if you have a source
character that's not in Unicode but is in whatever encoding wchar_t
uses, it would not be handled by the libiconv implementation of iconv()
because it uses Unicode as an intermediate form.

        paul


Home | Main Index | Thread Index | Old Index