Subject: Re: Unicode support in iso9660.
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
Date: 11/22/2004 22:05:56
>>> [...], since [UTF-8 is] the only standard UNIX-compatible way to
>>> handle full Unicode range.
>> (Actually, wouldn't UTF-7 also qualify? [...])
> I'm not sure utf7 was file-system-safe. utf8 has the properties that
> no byte of a multibyte sequence can ever be mistaken for another
> character, no matter where you jump into the string, the ASCII range
> is identity mapped, and (it follows) any NUL byte you see has to be a
> real NUL.
> I seem to recall utf7 did not have those properties (actually it
> seems they might not be simultaneously achievable in a 7-bit coding),
> though my memory can be tricky.
Well, obviously, no 7-bit code can both identity-map ASCII and
represent more than ASCII, since there are 128 ASCII codes..
But you are correct: UTF-7 (at least as described in RFC2152) does not
have the property that octets starting in the middle of an encoded
"large" character can be unambiguously identified as such.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML email@example.com
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B