Subject: Re: Unicode support in iso9660.
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 11/22/2004 22:05:56
>>> [...], since [UTF-8 is] the only standard UNIX-compatible way to
>>> handle full Unicode range.
>> (Actually, wouldn't UTF-7 also qualify?  [...])
> I'm not sure utf7 was file-system-safe.  utf8 has the properties that
> no byte of a multibyte sequence can ever be mistaken for another
> character, no matter where you jump into the string, the ASCII range
> is identity mapped, and (it follows) any NUL byte you see has to be a
> real NUL.

> I seem to recall utf7 did not have those properties (actually it
> seems they might not be simultaneously achievable in a 7-bit coding),
> though my memory can be tricky.

Well, obviously, no 7-bit code can both identity-map ASCII and
represent more than ASCII, since there are 128 ASCII codes..

But you are correct: UTF-7 (at least as described in RFC2152) does not
have the property that octets starting in the middle of an encoded
"large" character can be unambiguously identified as such.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B