Subject: Re: NFS problems
To: None <rick@snowhite.cis.uoguelph.ca>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/23/2002 10:38:21
On Tue, 23 Jul 2002 rick@snowhite.cis.uoguelph.ca wrote:

> Well, I'm not directly involved in the NetBSD code, but here are a few
> random comments that might be useful:
>
> - If you have too large a blocksize when using NFS over UDP, you'll
>   see "IP Fragments dropped due to timeout" when you do "netstat -s".
>   (These indicate that fragments of the large UPD datagram aren't
>    making it through the network interconnect and cause serious
>    performance degredation. The "fix" is to either reduce the read/write
>    data size or switch to TCP. See "man mount_nfs" to find out how to
>    do either of these.)
>
> - For NFS Version 2, the spec. (RFC 1094) stipulated a maximum of 8,192 bytes,
>   so using a larger blocksize for V2 violates the spec. (I'm sure
>   implementations do it and I'm sure some work, but if you want to be
>   technically correct, you should only allow block sizes > 8192 for V3.)
>
> - The NFS code is much more sensitive to buffer cache race conditions
>   than local file systems. Among other reasons is the fact that local
>   file systems almost never take several seconds to do an I/O operation.
>   (Mounting a really slow NFS server, like one with debugging printfs
>    turned on is the best way to "find" these. Been there, have the T-shirt.
>    Now, once you "find" them, fixing them can be great fun.) In the past,
>   when I say mysterious intermittent hangs, it usually turned out to be
>   buffer cache bugs. "ps axl" should give you a hint, based on what the
>   processes are waiting on.

I would tend to think something like this is the problem. Remember that
Steve pointed out he's seeing the problem over the loopback, so I don't
think that packet fragmentation is the issue. :-)

Take care,

Bill