Subject: Re: Unicode support in iso9660.
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 11/21/2004 15:39:32
>> Ah, but UTF-8 does change that: it means that lots of octet
>> sequences that were perfectly good under the previous paradigm
>> (eg,[...]) are now invalid.
> I mean - if a file name is encoded into UTF-8, encoding-agnostic app
> can handle it the same way any non-UTF8 filename is handled,

Yes, for handling existing filenames that's true.  It's not true for
taking octet strings obtained from elsewhere and using them as
filenames.  (If the UTF8ness is exposed to the application, many octet
strings that used to be fine are newly illegal.  If not, what octet
strings are legal depends on the locale - eg, in 8859-1, any string
containing an octet in the 0x80-0x9f range is invalid, as are pathname
components under 256 bytes which transform into UTF8 strings over 256
bytes.)

> '/' is still a slash and there is still just single 0x00 on the end.
> i.e. UTF8 file names are fully compatible with the way UNIX systems
> work.

No.  They're not grossly incompatible, but they're very far from fully
compatible.  See above for examples.

> The compatibility problem you describe is only present if application
> interprets the file name it gets.

Depends.  For programs that get filenames only from existing entries in
the filesystem, and don't interpret them, you are correct.  But that
isn't all that many programs, and for others - those that create
filenames de novo, or those that get them from somewhere else (such as
user input or embedded in a data stream like a tar archive) - this
suddenly breaks a whole lot of formerly valid and useful filenames,
compelling all such programs to make the same paradigm shift from
pathnames as octet sequences to pathnames as character sequences.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B