RE: wchar_t encoding?

To: "Valeriy E. Ushakov" <uwe%stderr.spb.ru@localhost>, <tech-misc%netbsd.org@localhost>
Subject: RE: wchar_t encoding?
From: "Paul Koning" <Paul_Koning%Dell.com@localhost>
Date: Thu, 20 May 2010 11:01:04 -0400

> > ...
> > The trouble for NetBSD is that it asks iconv to translate to a
> character
> > set named "wchar_t".  That means "whatever the encoding is for the
> > wchar_t data type".  GNU libiconv supports that, so on platforms
that
> > use that library things are fine.

I did some digging to see how libiconv implements that feature.

If  __LIBC_ISO_10646__ is defined then it simply aliases this to an
appropriate width Unicode (ucs2 or ucs4).  That applies to Linux, for
example.

If it isn't defined (as is the case on NetBSD) but mbrtowc() exists,
then it uses that function.  More precisely, a conversion to "wchar_t"
first converts to Unicode, which is then fed into mbrtowc to produce the
wchar_t encoding.  mbrtowc knows about any locale issues...

I guess that means that "multibyte" is Unicode, or UTF-8???  I don't see
that documented in any manpage.  It also means that if you have a source
character that's not in Unicode but is in whatever encoding wchar_t
uses, it would not be handled by the libiconv implementation of iconv()
because it uses Unicode as an intermediate form.

        paul

Follow-Ups:
- Re: wchar_t encoding?
  - From: James Chacon
- Re: wchar_t encoding?
  - From: Valeriy E. Ushakov

References:
- wchar_t encoding?
  - From: Paul Koning
- Re: wchar_t encoding?
  - From: Valeriy E. Ushakov

Prev by Date: RE: wchar_t encoding?
Next by Date: Re: wchar_t encoding?
Previous by Thread: Re: wchar_t encoding?
Next by Thread: Re: wchar_t encoding?
Indexes:

Home | Main Index | Thread Index | Old Index