tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



>> On MacOS X all filenames are UTF-8, NFD (so they're all decomposed).
>> Composed codepoints in a filename are decomposed into their base
>> character and combining character.
>> 
>> I believe under Solaris if you mount with a special Unicode option you
>> can use either composed or decomposed and the original byte sequence
>> is used as the filename, but you can't create two files that have the
>> same normalization 
>
>In the systems you're describing, the OS prohibits certain kinds of
>duplication, defined as two names sharing the same normalized byte
>sequence.  How do they deal with duplicates when mounting filesystems
>that permit them?  

I was a bit vague on the implementation details, so let me explain the
specifics.  Technically in both cases this isn't done OS-wide (it's
not in the VFS layer), but it's done a per-filesystem basis.  In MacOS X
this is done by HFS-specific code; it doesn't apply to other filesystems.
My understanding is the same thing happens on Solaris for local filesystems,
but you have to enable that with a special mount option.

Hm, okay, so let me revise that last bit ... for Solaris, that's only
for ZFS, and it's a dataset property; it looks like that property is one
you can only set at filesystem creation time.  So anyway, the point is
that this is done per-filesystem, and those filesystems are defined so
that there is never a "duplicate" name; they can't exist.  Obviously if
you use a remote filesystem that doesn't have those properties then the
OS just treats names as a byte sequence (or does whatever is defined for
the filesystem in question).  As for whether or not this is a good idea,
I will remain silent on that issue.

--Ken


Home | Main Index | Thread Index | Old Index