Subject: Re: NetBSD NSF server with OS X NFS clients
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Chuck Swiger <cswiger@mac.com>
List: netbsd-users
Date: 09/13/2007 14:38:41
On Sep 13, 2007, at 1:28 PM, Bill Stouder-Studenmund wrote:
>>> o TextEdit.app fails to open the test file whether I try File->Open
>>> within TextEdit or if I specify FileEdit.app from the Finder
>>>
>>> o Safari was the same as TextEdit, and won't open the file
>>>
>>> o Firefox exited with 'The application Firefox quit unexpectedly' (I
>>> tried twice, and it fell over twice) when I try to open the file
>>
>> The Mac HFS filesystem represents all filenames in Unicode (UTF16)
>> [1], which is not the case for Berkeley FFS aka the UFS of NetBSD. I
>> suspect that the tools mentioned above notice that the filename
>> contains non-ASCII characters, and convert the path into the UTF16
>> representation they expect to find and fail because that pathname
>> doesn't really exist over NFS.
>
> Yes, but NFS isn't HFS.
I don't recall anyone saying it was, but true enough.
> No one's going to send UTF-16 over the wire for NFS. It would =20
> never work.
> UTF-8 however would.
Sort of. The XDR definition from RFC-1832 actually limits strings in =20=
the NFS protocol to 7-bit US-ASCII, but it is common for =20
implementations to support filenames encoded in 8-bit ISO-Latin-1.
Rumor has it that NFSv4 might have full support for Unicode via =20
either UTF-16 or UTF-8.
> A more likely failure is that one of the frameworks is detecting =20
> that the
> path is not valid UTF-8 and rejecting things based on that.
That's certainly possible, from http://developer.apple.com/=20
documentation/MacOSX/Conceptual/BPInternational/Articles/=20
FileEncodings.html
"File Systems and Unicode Support
Different file systems in Mac OS X have different levels of Unicode =20
support:
Mac OS Extended (HFS+) uses canonically decomposed Unicode 3.2 in =20
UTF-16 format, which consists of a sequence of 16-bit codes. =20
(Characters in the ranges U2000-U2FFF, UF900-UFA6A, and U2F800-U2FA1D =20=
are not decomposed.) The UFS file system allows any character from =20
Unicode 2.1 or later, but uses the UTF-8 format, which consists =20
mostly of 8-bit ASCII codes but which may also include multibyte =20
codes. (Characters in the ranges U2000-U2FFF, UF900-UFA6A, and U2F800-=20=
U2FA1D are not decomposed.) Mac OS Standard (HFS) does not support =20
Unicode and instead uses legacy Mac encodings, such as MacRoman.
Locking the canonical decomposition to a particular version of =20
Unicode does not exclude usage of characters defined in a newer =20
version of Unicode. Because the Unicode consortium has guaranteed not =20=
to add any more precomposed characters, applications can expect to =20
store characters defined in future versions of Unicode without =20
compatibility issues.
All BSD system functions expect their string parameters to be in =20
UTF-8 encoding and nothing else. Code that calls BSD system routines =20
should ensure that the contents of all const *char parameters are in =20
canonical UTF-8 encoding. In a canonical UTF-8 string, all =20
decomposable characters are decomposed; for example, =E9 (0x00E9) is =20
represented as e (0x0065) + =B4 (0x0301). To put things into a =20
canonical UTF-8 encoding, use the =93file-system representation=94 =20
interfaces defined in Cocoa and Carbon (including Core Foundation)."
It'd be interesting for one of the people reporting the problem to =20
run tcpdump against your NFS traffic and see how these filenames are =20
actually being encoded in the requests.
Regards,
--=20
-Chuck