tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



On Mon, Apr 15, 2013 at 11:26:53AM -0400, Mouse wrote:
>
> (...) sometimes appears to be designed to
> draw semantically important but graphically irrelevant distinctions
> (such as having different codepoints for LATIN CAPITAL LETTER A and
> GREEK CAPITAL LETTER ALPHA).
> 

But in this case, the distinction is important because the same
_individual_ sign/grapheme in two distinct languages has to have
different codepoints, because one can then deduce some meta (whole
language) properties from what range (language) the grapheme is in.

I hope this is the case, because it will mean that for a typographical
system (so, once again, nothing at the system level), one can deduce 
the direction of rendering from the codepoint.

If the so-called arabic digits were the exact same glyphes in arabic
and in western languages (I doubt this is actually the case), using
in all these languages the ASCII codepoints will make impossible
to deduce automatically the direction of rendering; the writer just
producing a string of characters (that has no 2D direction: it is
1D: we know from which to start and we have only to start at the
beginning so the input system is direction rendering unaware) would have
to switch the digits depending on what he does want finally.
(Significant digit first is a greek thing---and a logical one: most
important things first, like in date: year first allowing fast
sorting---; if I'm not mistaken, a number looks the "same" in
arabic, because they are written left-to-right, that is, least
significant digit first considered from right-to-left).

I hope this is indeed a property of Unicode (even if only a side effect
of allowing sequential ranges for separate languages) because I'm basing
some design hypothesis for UTF-8 TeX upon that...

-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index