tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

wchar_t encoding?



Gents,

I'm working on a patch to gdb 7.1 to make it work on NetBSD.  The issue
is that GDB 7 uses iconv to handle character strings, and uses wide
chars internally so it can handle various non-ASCII scripts.

The trouble for NetBSD is that it asks iconv to translate to a character
set named "wchar_t".  That means "whatever the encoding is for the
wchar_t data type".  GNU libiconv supports that, so on platforms that
use that library things are fine.

NetBSD supports iconv, but it doesn't know the "wchar_t" encoding name.
So I proposed a patch that substitutes what appears to be used instead,
namely UCS-4 in platform native byte order (so "ucs-4le" on x86, for
example).  This seems to work.

The trouble is that I'm getting pushback on the patch, because of
concerns that the encoding used for wchar_t is not actually UCS-4.  In
particular, there is this article:
http://www.gnu.org/software/libunistring/manual/libunistring.html#The-wc
har_005ft-mess which says that on Solaris and FreeBSD the encoding of
wchar_t is "undocumented and locale dependent".  (Ye gods!)

Now, NetBSD is not FreeBSD... so... what is the answer for NetBSD?  Is
it like FreeBSD?  (If so, it would be good to fix that.)  Or is it a
fixed encoding, and if so, is it indeed ucs-4?

Thanks,
        paul


Home | Main Index | Thread Index | Old Index