Subject: Re: Unicode support in iso9660.
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
Date: 11/19/2004 13:20:19
> I think this could be handled if UTF-8 were the standard encoding for
> userland<->kernel interaction, yes?
I think this would be a mistake. File names have tradtionally been
opaque octet sequences containing 0x2f only as pathname component
separator and containing 0x00 only as terminator, not character
sequences, and I think changing that would be a Wrong Thing.
Of course, the filesystem needs to be allowed to reject names that are
unsuitable for it (eg, names that aren't 8.3 for a DOS filesystem), and
for filesystems that do specify filenames as character sequences in a
particular encoding, names not fitting that encoding should be
rejected. But that is not at all the same as making a fundamental
change from octet sequences to character sequences. Suddenly
everything, not excepting the kernel, needs to understand character
sets, because a whole lot of data that previously could be treated
blindly as octet sequences suddenly needs to be tagged with its
associated character set. If any - it's not at all clear what tag to
apply in code that uses filenames in non-character ways.
> I think UTF-8 could also handle this, which is great because it
> doesn't require any changes to the UFS on-disk representation.
It requires changes to the UFS on-disk specification, because it
renders existing pathname components which are invalid UTF-8 sequences
illegal. It most certainly changes how filenames are represented
on-disk when those filenames are not thought of as UTF-8 by the entity
that produced them.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML firstname.lastname@example.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B