Subject: Re: Unicode support in iso9660.
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 11/19/2004 17:50:57
>> Thus, if UTF-8 was only allowed for system call interface, all
>> people except who only use US-ASCII got catastrophic results.
> Well, there would be a flag day, we would have to think about
> find|iconv conversion scripts for on-disk representations in FFS, but
> most other stuff could be dealt with inside libc.

It also means that the maximum pathname component length will drop from
255 to some unpredictably lower value depending on the exact content of
the pathname component.

pWhat do you propose to do with existing pathname components that
exceed 255 octets in UTF8 (for whatever character set you consider the
original octet string as being in)?

>> The way also conflicts with the locale concept that applications
>> only have to use single codeset (i.e. the codeset of current
>> locale).
> That still would be true, but underneath libc would transform names
> to the proper kernel representation.

...and then the maximum pathname component length *will* be hard for
applications to predict, because it will depend on not only the octet
string passed to the call but also the locale in use.  This will be
especially problematic for applications that use filenames which are
data octet strings rather than character strings.  (I've written code
that creates filenames by expressing large integers in base 254.)

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B