Subject: How can I help with hung vnlock()'ed clients?
To: None <tech-kern@netbsd.org>
From: Stephen M. Jones <smj@cirr.com>
List: tech-kern
Date: 10/30/2003 20:50:56
Hi there!

I'm wondering what is the best way I can help with a problem I've experienced
and seen others experience for a couple of years now.  The problem is
where an NFS client will get 'hung' in a vnlock() and will most likely not
recover.  Since I have the oppourtunity to run NetBSD in a high usage
environment, I thought I might be able to help out .. but I want to know
what is the best way to.

Typically the NFS server and clients run the same kernel version which is
built from the 1.6.1 release tree.  I currently save a trace and ps when
in the debugger and then dump core to preserve the memory image.  Unfortunately
for performance reasons I can not run a kernel with symbols on the server,
but I can on the (a) client.  Note, the server never seems to have the vnlock
problem, only the clients .. so the server will typically run for weeks
without an issue .. its the clients that can get in a vnlock() hang anywhere
from 2hours of booting to 2weeks and anywhere in between.

If you have any tips or suggestions on how I can get evidence together so 
we can nail this one, please let me know .. I'll do my best.  I'm also aware
that the saved cores should only be looked at by trusted eyes.

Thanks
Stephen