tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wchar_t encoding?



James Chacon <chacon.james%gmail.com@localhost> wrote:
> On Thu, May 20, 2010 at 8:01 AM, Paul Koning <Paul_Koning%dell.com@localhost> 
> wrote:
>>> > ...
>>> > The trouble for NetBSD is that it asks iconv to translate to a
>>> character
>>> > set named "wchar_t".  That means "whatever the encoding is for the
>>> > wchar_t data type".  GNU libiconv supports that, so on platforms
>> that
>>> > use that library things are fine.
>>
>> I did some digging to see how libiconv implements that feature.
>>
>> If  __LIBC_ISO_10646__ is defined then it simply aliases this to an
>> appropriate width Unicode (ucs2 or ucs4).  That applies to Linux, for
>> example.
>>
>> If it isn't defined (as is the case on NetBSD) but mbrtowc() exists,
>> then it uses that function.  More precisely, a conversion to "wchar_t"
>> first converts to Unicode, which is then fed into mbrtowc to produce the
>> wchar_t encoding.  mbrtowc knows about any locale issues...
>>
>> I guess that means that "multibyte" is Unicode, or UTF-8???  I don't see
>> that documented in any manpage.  It also means that if you have a source
>> character that's not in Unicode but is in whatever encoding wchar_t
>> uses, it would not be handled by the libiconv implementation of iconv()
>> because it uses Unicode as an intermediate form.
> 
> I think part of your problem here is mixing terminology. Unicode is
> not an encoding,

While it might be technically sloppy, it doesn't change the gist of
the argument.

If we want to be pendantic we should stick to terminology in
http://unicode.org/reports/tr17/


> it's simply a definition of code points mapping to specific glyphs.

No :).  If we are into nitpicking, then dragging glyphs into this is a
much worse sin against terminology :)


> I'll have to go dig out my C99 but locale dependent could mean the
> number of bytes a wchar_t contains can vary by locale.

No, wchar_t has fixed size.  It's the bit pattern of wide characters
that is locale dependent.


SY, Uwe
-- 
uwe%stderr.spb.ru@localhost                       |       Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/          |       Ist zu Grunde gehen



Home | Main Index | Thread Index | Old Index