tech-misc archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: wchar_t encoding?
Paul Koning <Paul_Koning%dell.com@localhost> wrote:
>> It's simply impossible to always use unicode as the only encoding for
>> wchar_t, since not all charsets are 1:1 with unicode.
>
> That wasn't "they" -- the editorial comment was mine. I thought that
> Unicode by now is complete enough to be able to handle other charsets.
> It sounds like that's not true, or at least wasn't 12 years ago. Can
> you give an example of a charset for which Unicode is not sufficient?
I can invent an infinite number of them - it a matter of principle :),
the whole point is that C locale API (warts and all) is supposed to be
*completely* charset internals agnostic, you should be able to define
external locale information as groked by your C library, set your
LC_CTYPE &co accordinglly and a well behaved C program is supposed to
just work.
For a real life example, consider something like CSX (classical
sanskrit extended - a charset used to represent latin transliteration
of classical sanskrit). It has e.g. a character for "r with dot below
with macron with acute". Of course you can represent it using unicode
(you can inconv between csx and utf*), but you will need a sequence of
combining marks, i.e. it's not a 1:1 mapping, so a unicode wchar_t
cannot represent that character.
SY, Uwe
--
uwe%stderr.spb.ru@localhost | Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/ | Ist zu Grunde gehen
Home |
Main Index |
Thread Index |
Old Index