Subject: Re: Unicode support in iso9660.
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Jaromir Dolecek <jdolecek@NetBSD.org>
List: tech-kern
Date: 11/19/2004 21:37:24
der Mouse wrote:
> > I think this could be handled if UTF-8 were the standard encoding for
> > userland<->kernel interaction, yes?
> 
> I think this would be a mistake.  File names have tradtionally been
> opaque octet sequences containing 0x2f only as pathname component
> separator and containing 0x00 only as terminator, not character
> sequences, and I think changing that would be a Wrong Thing.
 
UTF-8 doesn't change that. UTF-8 stores US-ASCII character (0-127)
as-is, i.e. 1 byte values 0-127. Characters outside US-ASCII
are encoded to sequence of bytes with values 128-256. Such
sequence never contains US-ASCII characters. So '/' in UTF-8 string
can only ever be the '/' and so is 0x00.

Jaromir
-- 
Jaromir Dolecek <jdolecek@NetBSD.org>            http://www.NetBSD.cz/
-=- We should be mindful of the potential goal, but as the Buddhist -=-
-=- masters say, ``You may notice during meditation that you        -=-
-=- sometimes levitate or glow.   Do not let this distract you.''   -=-