Subject: Re: Unicode support in iso9660.
To: None <tech-kern@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 11/22/2004 13:45:48
>> Yes, this way of looking at it pushes a significant fraction of the
>> burden off onto humans.  However, nothing else I've seen both
>> supports human levels of flexibility in encoding (such as having
>> KOI-8 and 8859-1 file names in the same directory,
> Why to have a flexibility in encoding?  One encoding which can
> represent all the possible characters should be enough.

In a word, history.

If we had no filenames today, nor code to work with them, going with
UTF-8 might be not too wrong a thing.  Applications needing to encode
non-character filenames for storage could generate sequences of 16-bit
blobs instead of sequences of 8-bit blobs, with those sequences
secondarily encoded by treating them as Unicode/10646 codepoints and
representing them as sequences of 8-bit blobs, for storage.  (I'd
prefer to just store sequences of 16-bit blobs, rather than trying to
compress them into an 8-bit secondary encoding, but the 8-bit byte is
an even more entrenched standard.)

But we have a very large installed base that disagrees with that.
Approximately all programs treat filenames as octet sequences, both on
generation and on retrieval (while this is starting to change, with
things like GTK, legacy filenames and filename-manipulating programs
will be with us for the foreseeable future).

I for one am not prepared to throw that installed base away.

>> and what doesn't) and permits use of filenames that fundamentally
>> _aren't_ character sequences.
> OK, in that case don't expect the applications other than the one
> which created them to do anything meaningful with them.

I mostly don't.  Most such applications try to choose octet sequences
that are manipulable as characters using some encoding (printable ASCII
probably being the commonest, it being the closest thing we have to a
subset supported by all display and input tools).

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B