Subject: Re: Unicode support in iso9660.
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 11/19/2004 13:20:19
> I think this could be handled if UTF-8 were the standard encoding for
> userland<->kernel interaction, yes?

I think this would be a mistake.  File names have tradtionally been
opaque octet sequences containing 0x2f only as pathname component
separator and containing 0x00 only as terminator, not character
sequences, and I think changing that would be a Wrong Thing.

Of course, the filesystem needs to be allowed to reject names that are
unsuitable for it (eg, names that aren't 8.3 for a DOS filesystem), and
for filesystems that do specify filenames as character sequences in a
particular encoding, names not fitting that encoding should be
rejected.  But that is not at all the same as making a fundamental
change from octet sequences to character sequences.  Suddenly
everything, not excepting the kernel, needs to understand character
sets, because a whole lot of data that previously could be treated
blindly as octet sequences suddenly needs to be tagged with its
associated character set.  If any - it's not at all clear what tag to
apply in code that uses filenames in non-character ways.

> I think UTF-8 could also handle this, which is great because it
> doesn't require any changes to the UFS on-disk representation.

It requires changes to the UFS on-disk specification, because it
renders existing pathname components which are invalid UTF-8 sequences
illegal.  It most certainly changes how filenames are represented
on-disk when those filenames are not thought of as UTF-8 by the entity
that produced them.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B