tech-kern: Re: Unicode support in iso9660.

Subject: Re: Unicode support in iso9660.
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 11/19/2004 18:04:28

>>> I think this could be handled if UTF-8 were the standard encoding
>>> for userland<->kernel interaction, yes?
>> I think this would be a mistake.  File names have tradtionally been
>> opaque octet sequences containing 0x2f only as pathname component
>> separator and containing 0x00 only as terminator, not character
>> sequences, and I think changing that would be a Wrong Thing.
> UTF-8 doesn't change that.

Ah, but UTF-8 does change that: it means that lots of octet sequences
that were perfectly good under the previous paradigm (eg, 0xaa 0xaa
0xaa 0xaa 0xaa 0xaa 0xcc 0x42 - 牧牧牧夌 if you're using 8859-1) are
now invalid.  Furthermore, the mapping between characters and octet
streams either has to be pushed out to the application (which means
changing almost every file-accessing program in existence) or has to be
hidden in libc (which means it is probably implicit, by way of what
locale the creating program happens to be running in, and thus two
filenames that used to refer to the same file may well silently stop
doing so).

Does POSIX say anything about whether the octet sequences seen by the
application as pathname components may be implicitly paired with
information such as the current locale when determining what other
filenames they may match?  That is, if one program creates a file whose
name consists of one "character", 0xaa, must another program opening
the same name get the same file, or may it get a different file
depending on other state (such as what locales the programs were/are
running under)?

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B