tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wchar_t encoding?



Paul Koning <Paul_Koning%dell.com@localhost> wrote:

> I'm working on a patch to gdb 7.1 to make it work on NetBSD.  The issue
> is that GDB 7 uses iconv to handle character strings, and uses wide
> chars internally so it can handle various non-ASCII scripts.
> 
> The trouble for NetBSD is that it asks iconv to translate to a character
> set named "wchar_t".  That means "whatever the encoding is for the
> wchar_t data type".  GNU libiconv supports that, so on platforms that
> use that library things are fine.
>
> The trouble is that I'm getting pushback on the patch, because of
> concerns that the encoding used for wchar_t is not actually UCS-4.
> In particular, there is this article:
> http://www.gnu.org/software/libunistring/manual/libunistring.html#The-wchar_005ft-mess
> which says that on Solaris and FreeBSD the encoding of wchar_t is
> "undocumented and locale dependent".  (Ye gods!)

Why are they so surprised about that?  C99 says:

       3.7.3
       [#1] wide character
       bit  representation  that fits in an object of type wchar_t,
       capable of representing any character in the current locale

It's simply impossible to always use unicode as the only encoding for
wchar_t, since not all charsets are 1:1 with unicode.

Besides, iconv does not return (fsvo "return") wide strings, it
returns good old pointer to char.  Do they pass a pointer to wchar_t
as destination?

If they just assume it's going to be a pointer to wide string, then
correct implementation of "wchar_t" is for iconv to convert to a plain
string in current charset and then convert that to a wide string.

Or do they actually assume it's gonna be utf32?

SY, Uwe
-- 
uwe%stderr.spb.ru@localhost                       |       Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/          |       Ist zu Grunde gehen



Home | Main Index | Thread Index | Old Index