Re: A draft for a multibyte and multi-codepoint C string interface

To: tech-userlevel%netbsd.org@localhost
Subject: Re: A draft for a multibyte and multi-codepoint C string interface
From: Steffen "Daode" Nurpmeso <sdaoden%gmail.com@localhost>
Date: Tue, 02 Apr 2013 17:31:03 +0200

tlaronde%polynum.com@localhost wrote:
 |For Unicode complexity, all is not gratuitous. I found this when
 |thinking about the next step for kerTeX: adding Unicode/UTF-8 for TeX.
 |
 |Example: in Occident, we use arabic digits. If in occidental languages
 |and Arabic the "individual" digits are the same, considering that they
 |are part of a special set, they are not identical. If the digits are,
 |for occidental languages, in the ASCII range, in the arabic language
 |they should not. Because, based on the code, one can deduce the
 |language, and for example the direction of composition. Hence, TeX---to
 |take this example---could deduce the direction of composition from the
 |Unicode range.

It's even worse, since some languages use different conversion
systems (like base 20), don't know about the value 0 and/or have
special symbols/characters for several importan numbers, like
"1000" etc.  Of course a digittoi() cannot handle these cases (and
afaik Unicode didn't put any effort in this, a digit value is only
defined if a direct mapping is possible).

So, for this, some locale-dependent pre/after parser is or would
be necessary -- neither do i know of any implementation that
really does, nor does the current POSIX / C environment offer
a way to implement such pre/postprocessors.  But i also wouldn't
really worry about that, since the Innuit and the Indians and the
like have brand new writing systems that they didn't invent on
their own, and which use a LATIN-ish notation, and other languages
are dead and buried, and the rest also doesn't matter.  So for the
computer programs we talk about, at least.

--steffen

Follow-Ups:
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde

References:
- A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde

Prev by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Previous by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Indexes:

Home | Main Index | Thread Index | Old Index