Subject: Re: (long) NFS misbehaving under -current?
To: Urban Boquist <firstname.lastname@example.org>
From: Robert Elz <kre@munnari.OZ.AU>
Date: 10/08/1998 21:37:43
Date: Thu, 8 Oct 1998 13:02:09 +0200 (CEST)
From: Urban Boquist <email@example.com>
| Wolfgang> I'm seeing a problem with amd and/or nfs between two current
| Wolfgang> systems. If the server is rebooted while the client has a
| Wolfgang> directory tree mounted, then the client will hang with 'nfs
| Wolfgang> server not responding' even after the server comes back
| Wolfgang> online.
| This may be related to PR bin/6037.
I doubt it (or rather, if the two are related, it is more likely that the
amd problem is a symptom of a basic kernel NFS problem.)
I see the same, and I don't use amd (on NetBSD) at all, just NFS mounts
from fstab. Accesses to filesystems mounted as "soft" never time out,
nor do accesses ever complete after the server returns if they have
hung while the server was down. Interruptable mounts can be interrupted
just fine though. All this suggests that there's a problem with the
timers in the NFS code, or the way they're handled.
While the server is down, kvm usage increases dramatically (as accesses
hang), which indicates to me there's no limit on how many requests are
allowed to get into hung state after consuming kvm, rather than detecting
that the server is not responding to other requests, and so probably won't
respond to this one either, and hanging the request at the syscall interface
before any resources are needed. This can easily cause kvm exhaustion and
a panic (which at least clears up the hung nfs processes....)
I have recently started a bit of a look through the nfs code to see if I can
spot what is going on in there - but as I have neevr looked there before, it
is going to take a while I expect. If someone who knows it better finds the
problem(s) sooner than I, I certainly won't regret it!