Re: A draft for a multibyte and multi-codepoint C string interface

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: A draft for a multibyte and multi-codepoint C string interface
From: Ken Hornstein <kenh%cmf.nrl.navy.mil@localhost>
Date: Sun, 31 Mar 2013 18:41:06 -0400

>(I don't know what other OSs have done; it doesn't seem like much.  A
>quick trip to Google and though the docs on my local Ubuntu system show
>that bash and glob(7) provide for "equivalent" sequences, e.g. "[=a=]"
>matches most things that look like "a".  They are silent on the
>question of various codepoint sequences for "Ã¥".)  

A quick note:

On MacOS X all filenames are UTF-8, NFD (so they're all decomposed).
Composed codepoints in a filename are decomposed into their base character
and combining character.

I believe under Solaris if you mount with a special Unicode option you
can use either composed or decomposed and the original byte sequence is
used as the filename, but you can't create two files that have the same
normalization (or maybe they are treated as the same filename; I'm a
little unclear on the exact details).

Personally, I prefer the latter behavior; I think it's damn unfriendly
to create a filename and have it change it's name changed by the
filesystem.  I understand why it was done, but I still think the Solaris
behavior is better.

--Ken

Follow-Ups:
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden

References:
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden

Prev by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Previous by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Indexes:

Home | Main Index | Thread Index | Old Index