Re: A draft for a multibyte and multi-codepoint C string interface

To: tech-userlevel%netbsd.org@localhost
Subject: Re: A draft for a multibyte and multi-codepoint C string interface
From: tlaronde%polynum.com@localhost
Date: Mon, 1 Apr 2013 10:29:38 +0200

On Sun, Mar 31, 2013 at 06:28:47PM -0400, James K. Lowden wrote:
> 
> Certainly the Worse is Better school would push the problem out to
> userland and absolve the filesystem.   ISTM filenames are there for the
> user's sake, and filename uniqueness is judged at the semantic level of
> linguistic perception.  Leaving him to fend for himself against
> Unicode's unfortunate complexity is a disservice.  
> 

For Unicode complexity, all is not gratuitous. I found this when
thinking about the next step for kerTeX: adding Unicode/UTF-8 for TeX.

Example: in Occident, we use arabic digits. If in occidental languages
and Arabic the "individual" digits are the same, considering that they
are part of a special set, they are not identical. If the digits are,
for occidental languages, in the ASCII range, in the arabic language
they should not. Because, based on the code, one can deduce the
language, and for example the direction of composition. Hence, TeX---to
take this example---could deduce the direction of composition from the
Unicode range.

There is a simple solution, the one developed by Ken Thompson and al.
from the Bell Labs: UTF-8. As long as the system is concerned, the
filenames should be octets strings (UTF-8) and the same filename
is the exact same string. No semantics at all. (I simply hate
filesystem that are case sensitive, and I simply don't want the
disease to go any further. Two different codepoints are two different
characters. Would you want to consider too the font they are rendered
with? Because a same codepoint can have a very different aspect in
two different fonts...)

-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

Follow-Ups:
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Thor Lancelot Simon

References:
- A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden

Prev by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Previous by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Indexes:

Home | Main Index | Thread Index | Old Index