Re: A draft for a multibyte and multi-codepoint C string interface

To: tech-userlevel%netbsd.org@localhost
Subject: Re: A draft for a multibyte and multi-codepoint C string interface
From: "James K. Lowden" <jklowden%schemamania.org@localhost>
Date: Tue, 2 Apr 2013 20:01:42 -0400

On Tue, 2 Apr 2013 18:08:01 +0200
tlaronde%polynum.com@localhost wrote:

> UTF-8 has the same role as UTC time. There is one and only one
> canonical representation, fixed. And the display of the information
> is customized according to user level rules.

UTC is a simpler problem.  With UTF-8, the same set of characters may be
represented by more than one set of bytes.  And, while NetBSD may
prevent non-canonical sequences in filenames, it must be able to mount
and cope with filesystems that were not so carefully managed by
other systems.  

> So that the kernel interface should take and give UTF-8, and that 
> filesystem drivers should take and give UTF-8, user level utilities
> converting from the current encoding to unicode and UTF-8.
> 
> But that's all. If one user really wants to take into account
> acrobatics about collating sequences and the like, he can use/develop
> a program to do so.

You can't fob it off to userspace.  At least I don't think so.  

Consider open(2).  Every element in the pathname needs
canonicalization.  OK, userspace can do that.  But what if the
filesystem doesn't conform?  Say, because it's a CD-ROM, or a camera,
never mind NFS/sshfs/samba/PUFFs.   

ISTM that to open a file, the kernel needs a more sophisticated
definition of string equality than a byte-for-byte comparison.  At the
very least, it has to be able to canonicalize extant names on the disk,
and to deal somehow with duplicates.  

--jkl

Follow-Ups:
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde

References:
- A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde

Prev by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Previous by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Indexes:

Home | Main Index | Thread Index | Old Index