tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wchar_t encoding?



On Thu, May 20, 2010 at 8:01 AM, Paul Koning <Paul_Koning%dell.com@localhost> 
wrote:
>> > ...
>> > The trouble for NetBSD is that it asks iconv to translate to a
>> character
>> > set named "wchar_t".  That means "whatever the encoding is for the
>> > wchar_t data type".  GNU libiconv supports that, so on platforms
> that
>> > use that library things are fine.
>
> I did some digging to see how libiconv implements that feature.
>
> If  __LIBC_ISO_10646__ is defined then it simply aliases this to an
> appropriate width Unicode (ucs2 or ucs4).  That applies to Linux, for
> example.
>
> If it isn't defined (as is the case on NetBSD) but mbrtowc() exists,
> then it uses that function.  More precisely, a conversion to "wchar_t"
> first converts to Unicode, which is then fed into mbrtowc to produce the
> wchar_t encoding.  mbrtowc knows about any locale issues...
>
> I guess that means that "multibyte" is Unicode, or UTF-8???  I don't see
> that documented in any manpage.  It also means that if you have a source
> character that's not in Unicode but is in whatever encoding wchar_t
> uses, it would not be handled by the libiconv implementation of iconv()
> because it uses Unicode as an intermediate form.
>

I think part of your problem here is mixing terminology. Unicode is
not an encoding, it's simply a definition of code points mapping to
specific glyphs. UTF-8/16/32/shift-JIS/etc are all "encodings".

I'll have to go dig out my C99 but locale dependent could mean the
number of bytes a wchar_t contains can vary by locale.

James


Home | Main Index | Thread Index | Old Index