Subject: Re: Strange NFS-client bug
To: None <tech-kern@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 07/19/2002 02:35:07
> We have two Mac QuickSilver 2002 running 1.6 -current (4 days old)
> that we have a problem with the NFS-clients on.  The really strange
> thing is that it is only one file that we have this problem with.
> [...]

Have you tried moving the offending bytes to lower in the file but the
same position modulo 8192 (and truncating the file size to match)?

I recall a situation I was in which isn't quite the same but may be
similar enough to offer some hints: we had an NFS server (S) and client
(C).  Attempting to read certain files on S, from C, would stall the
process (not the whole mount point); if the mount was soft, the process
would be interruptible as usual.  This happened on only a few percent
of the files on the server, with no visible rhyme or reason to which
files.

Eventually, running etherfind (a SunOS tcpdump-alike) revealed the
problem: S's idea of the minimum Ethernet packet size was four bytes
less than C's hardware's ditto.  (S's was 60, C's was 64 - AIUI, S was
correct.)  So when reading the last block of the file, for certain fle
sizes, S would send out a packet the last fragment of which was <64
bytes (if <60, S would pad to 60 bytes).  C was unable to receive this
(Ethernet) packet, so reassembly would fail and the (IP) packet would
be lost.

The solution was to use "rsize=1024,wsize=1024" on the mount; with the
UDP and RPC headers, no packet was <64 bytes - but a large packet, when
fragmented at the IP level, could have a tiny last fragment, thereby
provoking the problem.

Now, your situation is visibly different in various ways.  In
particular, changing the data matters.

> I've tried putting the file on three different NFS-servers, one
> running NetBSD 1.6 -current, one running FreeBSD 4.6 and one running
> Linux, and the problems remains the same.  I've used other
> NFS-clients (NetBSD 1.5.2 on i386, FreeBSD 4.6 and Linux) and they
> can read the file without any problems.

Have you tried any other NFS client on the same client hardware?

> This leads me to belive that the bug is in the NetBSD -current
> NFS-client (or some of its subsystems, RPC?).

If you haven't verified that other software on the same hardware can
read the file, I'd be inclined to suspect your hardware.  You say you
have _two_ Macs; do they both exhibit identical behaviour as clients in
this regard?  That would make it more likely it is software - or
perhaps a bug in a particular chip rev, used in both machines.

It might be interesting to tcpdump and see if the client machine can
receive the relevant NFS packet at all.  If so, and the contents are
correct (compare vs a copy tcpdumped on another machine), that points
to software; if not, that points to hardware.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B