tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



> BTW, I think Plan 9's insistence that everything "textual" inside the
> system always be in Unicode in UTF-8 all the time is one of its key
> features.

Maybe, but I think it's a horrible, horrible mistake.  The only excuse
for UTF-8 is to shoehorn Unicode into a system with the "char = 8 bits"
assumption already wired deeply into it.  A system built from the start
to use Unicode really should be using 16-bit chars - or 24-bit (or
maybe 32-bit, depending) if you want to support more than the BMP.

>> Which still leaves open the problem of locales and issues of
>> multi-lingual documents and applications where a single Unicode
>> glyph really should be represented differently depending upon what
>> language it is being used for,

Or cases where what, in isolation, could reasonably be said to be the
same character needs to be encoded differently depending on what
language it's for.  For example, Latin, Cyrllic, and Greek each have a
character formed by two parallel vertical strokes joined by a
horizontal stroke approximately midway up, but it means something
fairly drastically different to each of them.  Yet, written on a piece
of paper in isolation, there is no difference; any of them could
equally well be taken to be any of the others.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index