tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RE: wchar_t encoding?



> http://www.gnu.org/software/libunistring/manual/libunistring.html#The-
> wchar_005ft-mess
> > which says that on Solaris and FreeBSD the encoding of wchar_t is
> > "undocumented and locale dependent".  (Ye gods!)
> 
> Why are they so surprised about that?  C99 says:
> 
>        3.7.3
>        [#1] wide character
>        bit  representation  that fits in an object of type wchar_t,
>        capable of representing any character in the current locale
> 
> It's simply impossible to always use unicode as the only encoding for
> wchar_t, since not all charsets are 1:1 with unicode.

That wasn't "they" -- the editorial comment was mine.  I thought that
Unicode by now is complete enough to be able to handle other charsets.
It sounds like that's not true, or at least wasn't 12 years ago.  Can
you give an example of a charset for which Unicode is not sufficient?
 
> Besides, iconv does not return (fsvo "return") wide strings, it
> returns good old pointer to char.  Do they pass a pointer to wchar_t
> as destination?

Yes.  The iconv documentation says that the arguments are buffer
pointers, so their type is whatever the source or destination encoding
name implies.
 
> If they just assume it's going to be a pointer to wide string, then
> correct implementation of "wchar_t" is for iconv to convert to a plain
> string in current charset and then convert that to a wide string.
> 
> Or do they actually assume it's gonna be utf32?

No, that's exactly the issue.

The C99 rule you quoted says (or at least implies) that the encoding of
wchar_t is locale dependent.  So the question is: how does a program
find out WHAT encoding wchar_t uses right now?  I don't see any API for
obtaining that information.  Clearly this is necessary -- how else can a
program construct properly encoded wide char data if it needs to do so
(as GDB does)?

        paul 


Home | Main Index | Thread Index | Old Index