Subject: Re: poor nfs error recovery
To: None <firstname.lastname@example.org>
From: der Mouse <mouse@Collatz.McRCIM.McGill.EDU>
Date: 12/14/1994 10:24:17
> I have an i386 machine with an old 8 bit 3c503 card; this card is
> lame enough that it produces a handful of "ring buffer overruns"
> under heavy network load. This machine both serves and clients NFS.
> Due (I presume) to the ethernet card, access (read/write) of file
> larger than ~6K hangs. (5.8K works, 6.2K fails).
What are the exact sizes? What about other sizes?
Some else already suggested that you may be receiving more packets
back-to-back than you have buffer space for, and suggested pulling back
the rsize and wsize on the mount.
I ran into a similar problem once, but it was much more infuriating.
Machine A would NFS mount a directory from machine B. Trying to cat
certain files - maddeningly few of them, but a given file either always
worked or always failed - would hang, as if the server were
What I eventually tracked this down to was this: machine A's Ethernet
hardware was incapable of receiving packets smaller than 64 bytes.
Machine B's Ethernet hardware would happily send out packets as small
as 60 bytes (smaller packets would be padded). When an NFS reply was
fragmented, the last fragment's packet on the wire could be smaller
than 64 bytes. Machine A would reliably drop this last fragment,
leading to reassembly failing and retransmissions, which would be
dropped equally reliably. Thus, the problem struck with files whose
sizes were just right to provoke the last read request's reply to (a)
be large enough to be fragmented and (b) just the right size modulo the
amount of data per fragment that the last fragment was tiny.
It struck only with the last fragment of fragmented packets, not with
even single-byte full reply packets, because with not only the IP
header but also the UDP and RPC stuff, the packet was invariably over
64 bytes long. But the second and later fragment of a fragmented
packet carry only very minimal IP headers.
Setting rsize=1024,wsize=1024 (so that all replies were small enough to
fit in a single Ethernet packet and thus never need fragmenting) cured
the problem. This is why I asked about other sizes: if you're
suffering from something similar, larger sizes may work fine.