Subject: Re: Unicode support in iso9660.
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Jaromir Dolecek <jdolecek@NetBSD.org>
Date: 11/21/2004 23:58:46
der Mouse wrote:
> strings are legal depends on the locale - eg, in 8859-1, any string
> containing an octet in the 0x80-0x9f range is invalid, as are pathname
> components under 256 bytes which transform into UTF8 strings over 256
Well, 0x80-0x9f _are_ valid characters, just not printable. As for
the file length overflow case, while possible, it's rare enough to
not be worth considering IMO.
> Depends. For programs that get filenames only from existing entries in
> the filesystem, and don't interpret them, you are correct. But that
> isn't all that many programs, and for others - those that create
> filenames de novo, or those that get them from somewhere else (such as
> user input or embedded in a data stream like a tar archive) - this
> suddenly breaks a whole lot of formerly valid and useful filenames,
> compelling all such programs to make the same paradigm shift from
> pathnames as octet sequences to pathnames as character sequences.
That's unfortunate, but arguably inevitable. If file-system does
assume some internal encoding, there pretty much is no chance to
escape that and non-conforming file names must be refused. We only
get away this with msdosfs since we pretend all file names are in
iso-8859-1, making msdosfs appear as encoding agnostic. Unfortunately
this is not interoperable when actually used charset is other than
ISO-8859-1. Changing msdosfs to use UTF-8 would break msdosfs
for people happily using ISO-8859-1, however. It appears it would be
necessary to have an option to present/accept the file names in a
selected 8bit encoding rather than UTF-8 after all, and probably
even default to iso-8859-1 mapping to preserve current behaviour.
Jaromir Dolecek <jdolecek@NetBSD.org> http://www.NetBSD.cz/
-=- We can walk our road together if our goals are all the same; -=-
-=- We can run alone and free if we pursue a different aim. -=-