tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Nfs tstiles



We got a new NFS server at work over the weekend, and it's been a bit
flaky and causing my machines to hang left and right. In the course of
poking about with ddb and crash I've found no less than three
problems:

(1) The syncer holds syncer_mutex while calling VOP_FSYNC. If
VOP_FSYNC goes off to pick daisies (perhaps for one of the reasons
below, perhaps something else), syncer_mutex remains locked forever.
Since unmounting requires syncer_mutex, any attempt to unmount
anything, including umount -f on the offending volume, hangs forever
in an uninterruptible sleep on "tstile". Note that while the syncer is
not in an uninterruptible sleep (at least if you've done your nfs
mounts correctly) to the best of my knowledge there's no way to
interrupt or send signals to a kernel thread.

(Also note that doing I/O while holding a mutex is contrary to rmind's
repeated assertion that mutexes aren't supposed to be held for long
periods of time.)

(2) nfs_receive contains a loop (actually three copies that are
slightly different depending on the socket type) that continues
calling so->so_receive as long as the result is EWOULDBLOCK, i.e., as
long as the receive attempt timed out, i.e., forever. The proper way
out of this loop on a server failure is that nfs_timer() sets
R_SOFTTERM on the nfsreq structure. However, this is apparently not
working. I don't know why yet.

(3) nfs_rcvlock contains an infinite loop waiting on nfs_rcvcv; while
it uses cv_timedwait, the timeout logic seems to be a bizarre way of
attempting to allow interruptible sleeps and does nothing to prevent
the loop from looping forever. It is definitely possible for processes
to get stuck in here because I've seen it. However, since it appears
that this logic is a handrolled lock using a condvar (the existence of
which is itself a bug), perhaps getting stuck here is actually a
deadlock condition and a symptom of something else hanging elsewhere?

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index