Re: wchar_t encoding?

To: tech-misc%netbsd.org@localhost
Subject: Re: wchar_t encoding?
From: uwe%stderr.spb.ru@localhost (Valeriy E. Ushakov)
Date: Thu, 20 May 2010 18:11:37 +0000 (UTC)

Paul Koning <Paul_Koning%dell.com@localhost> wrote:

>> It's simply impossible to always use unicode as the only encoding for
>> wchar_t, since not all charsets are 1:1 with unicode.
> 
> That wasn't "they" -- the editorial comment was mine.  I thought that
> Unicode by now is complete enough to be able to handle other charsets.
> It sounds like that's not true, or at least wasn't 12 years ago.  Can
> you give an example of a charset for which Unicode is not sufficient?

I can invent an infinite number of them - it a matter of principle :),
the whole point is that C locale API (warts and all) is supposed to be
*completely* charset internals agnostic, you should be able to define
external locale information as groked by your C library, set your
LC_CTYPE &co accordinglly and a well behaved C program is supposed to
just work.

For a real life example, consider something like CSX (classical
sanskrit extended - a charset used to represent latin transliteration
of classical sanskrit).  It has e.g. a character for "r with dot below
with macron with acute".  Of course you can represent it using unicode
(you can inconv between csx and utf*), but you will need a sequence of
combining marks, i.e. it's not a 1:1 mapping, so a unicode wchar_t
cannot represent that character.

SY, Uwe
-- 
uwe%stderr.spb.ru@localhost                       |       Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/          |       Ist zu Grunde gehen

References:
- wchar_t encoding?
  - From: Paul Koning
- Re: wchar_t encoding?
  - From: Valeriy E. Ushakov
- RE: wchar_t encoding?
  - From: Paul Koning

Prev by Date: Re: wchar_t encoding?
Next by Date: Re: wchar_t encoding?
Previous by Thread: RE: wchar_t encoding?
Next by Thread: RE: wchar_t encoding?
Indexes:

Home | Main Index | Thread Index | Old Index