Subject: Re: nfsd on i386 current hangs in vnlock
To: None <current-users@netbsd.org>
From: Aaron J. Grier <agrier@poofygoof.com>
List: current-users
Date: 12/15/2005 14:22:09
On Thu, Dec 15, 2005 at 08:43:28AM -0800, Bill Studenmund wrote:
> On Thu, Dec 15, 2005 at 12:29:15PM +0000, Bruce O'Neel wrote:
> > I played some more last night after I came home, and, once again,
> > all nfsd processes were in vnlock again.  From the log messages it
> > seemed that everything had been frozen for over 2 hrs.
> 
> Hmmm.... Sounds like something else may be wrong. Any hardware issue 

I've seen similar behavior in the 2-0 branch with 2.0.1.  2.0.2_STABLE
seems much better in that respect.  I still suspect something rotten
with NFS triggered by packet loss and retransmits.  dropping the NFS
size down to 4k from 32k and using UDP instead of TCP seemed to help my
situation:

alpha:/usr/home on /amd/alpha/usr/home type nfs (writes: sync 0 async 0,
[nfs: addr=10.0.0.28, port=2049, addrlen=16, sotype=2, proto=17,
fhsize=0, flags=0x8256<wsize,rsize,retrans,intr,nfsv3,resvport>,
wsize=4096, rsize=4096, readdirsize=4096, timeo=300, retrans=100,
maxgrouplist=16, readahead=2, leaseterm=30, deadthresh=9])

I think TCP vs UDP simply pushes the retransmit mechanism from one side
of the network stack to the other, but doesn't fix the apparent livelock
which occurs when the client loses track of the server far enough to
generate the "not responding" messages.  it's as if the client
completely gives up trying to contact the server after a certain point,
and thus the mount just hangs.

http://mail-index.netbsd.org/netbsd-users/2005/04/25/0002.html

I've meant to try and replicate this with a kgdb-enabled client under
controlled circumstances (a crappy 10mbit half duplex hub would be
sufficient) but haven't had the time...

-- 
  Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
              "silly brewer, saaz are for pils!"  --  virt