tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wchar_t encoding?



Paul Koning <Paul_Koning%dell.com@localhost> wrote:

>> It's simply impossible to always use unicode as the only encoding for
>> wchar_t, since not all charsets are 1:1 with unicode.
> 
> That wasn't "they" -- the editorial comment was mine.  I thought that
> Unicode by now is complete enough to be able to handle other charsets.
> It sounds like that's not true, or at least wasn't 12 years ago.  Can
> you give an example of a charset for which Unicode is not sufficient?

I can invent an infinite number of them - it a matter of principle :),
the whole point is that C locale API (warts and all) is supposed to be
*completely* charset internals agnostic, you should be able to define
external locale information as groked by your C library, set your
LC_CTYPE &co accordinglly and a well behaved C program is supposed to
just work.

For a real life example, consider something like CSX (classical
sanskrit extended - a charset used to represent latin transliteration
of classical sanskrit).  It has e.g. a character for "r with dot below
with macron with acute".  Of course you can represent it using unicode
(you can inconv between csx and utf*), but you will need a sequence of
combining marks, i.e. it's not a 1:1 mapping, so a unicode wchar_t
cannot represent that character.


SY, Uwe
-- 
uwe%stderr.spb.ru@localhost                       |       Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/          |       Ist zu Grunde gehen



Home | Main Index | Thread Index | Old Index