Re: A draft for a multibyte and multi-codepoint C string interface

To: tech-userlevel%netbsd.org@localhost
Subject: Re: A draft for a multibyte and multi-codepoint C string interface
From: "James K. Lowden" <jklowden%schemamania.org@localhost>
Date: Mon, 15 Apr 2013 17:51:33 -0400

On Mon, 15 Apr 2013 11:05:50 +0200
tlaronde%polynum.com@localhost wrote:

> If there are user level tools to filter the "ls" output to match the
> variations (accented, not accented; capitalized, not capitalized;
> ligatures, no ligatures), fine. But user level.

If I understand you correctly, the most important point in this
discussion is that the kernel must make no interpretation of the
filename.  

> what do you get for writing the ligature 'oe' in naming a resource
> 'oeuvres' instead of the plain letters? 

What do you mean by "plain" letters?  ASCII?  Perhaps my example was
poorly chosen, because the "oe" ligature is only a custom.  Too many
languages cannot be represented, even crudely, with ASCII.  

What you get is the user's ability to name things in his native
tongue.  

> What has this to do with a computer resource? 

        "There are only two problems in computer science.  Cache
coherency and naming things."  

I'm interested in useability.  We already *permit* filenames to be
encoded with UTF-8, but we don't *support* them.  We permit two
filenames in one directory whose letter sequence is identical if the
byte sequence differs.  The sort order is arbitrary: "coeur" and
"c?ur" don't sort next to each other, although they should.  The user
has no way to know nor reason to care whether "året" uses four
Unicode code points or five.  If he types "vi året", I think the file
should open if the character strings match regardless of the
byte-sequences, but today the odds are 1:4 against.  

Who considers that state of affairs good?  

I'm confident that glob(3) could be adapted to Unicode, that open(2)
could canonicalize, that ffs could be changed to reflect the encoding,
and mount(2) to enforce it.  That's just a small matter of
programming.  For it to happen, though, we need consensus that's it's
good and necessary.  A consensus that seems surprisingly hard to
establish.  

--jkl

Follow-Ups:
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: David Young
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Antoine LECA
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde

References:
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde

Prev by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Previous by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Indexes:

Home | Main Index | Thread Index | Old Index