tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface

>(I don't know what other OSs have done; it doesn't seem like much.  A
>quick trip to Google and though the docs on my local Ubuntu system show
>that bash and glob(7) provide for "equivalent" sequences, e.g. "[=a=]"
>matches most things that look like "a".  They are silent on the
>question of various codepoint sequences for "Ã¥".)  

A quick note:

On MacOS X all filenames are UTF-8, NFD (so they're all decomposed).
Composed codepoints in a filename are decomposed into their base character
and combining character.

I believe under Solaris if you mount with a special Unicode option you
can use either composed or decomposed and the original byte sequence is
used as the filename, but you can't create two files that have the same
normalization (or maybe they are treated as the same filename; I'm a
little unclear on the exact details).

Personally, I prefer the latter behavior; I think it's damn unfriendly
to create a filename and have it change it's name changed by the
filesystem.  I understand why it was done, but I still think the Solaris
behavior is better.


Home | Main Index | Thread Index | Old Index