Subject: lockinit / vnlock deadlocking
To: None <email@example.com, firstname.lastname@example.org>
From: Stephen M. Jones <email@example.com>
Date: 11/08/2003 02:40:15
I hope that this can help. The problem that I'm seeing on our NFS clients
nearly on a daily basis is that a client will get hung in a vnlock deadlock.
To help track this problem, I changed the "vnlock" text in lockinit() on
line 538 in the file kern/vfs_subr.c to "vnlockone" as well, I changed the
one on line 1101 to "vnlocktwo" just this past week.
We've had about 5 vnlock deadlocks .. each time, it has been 'vnlockone'
on line 538.
If the client is being used and the deadlock occurs, usually local file
access will be fine, but remote access (ls, df, et cetera) will hang and
a ^T will report "vnlockone". Once the deadlock occurs, the machine will
become swamped by user requests, cronjobs and such so that the only
solution is to do a hard reboot.
I don't believe this is an ethernet driver issue (if some believe it could
be.. but maybe I'm wrong.. ) the reason why is that the DS10Ls use the tlp
driver while the CS20s use the fxp driver. Both can get deadlocked with the
Depending on the situation, the kernel can panic .. but most of the time
it will just hang until the process table has filled.
Some things I'd like to note:
* nfsd options are -tun 12
* there are 7 nfs clients (mounting home directories, webspace & mail)
* typically 80 to 100 users per client at any given time
* nfs is run on a seperate network from normal traffic using independent
* occassionally, almost rarely messages will be seen regarding the server
not responding, then responding.
* bufcache on the fileserver is 12% of 1024mb of ram
* only two or three nfsd actually seem to be busy
Just for a clear reference, here is a portion of the vfs_subr.c code:
536 vp->v_type = VNON;
537 vp->v_vnlock = &vp->v_lock;
538 lockinit(vp->v_vnlock, PVFS, "vnlockone", 0, 0);
540 vp->v_tag = tag;
541 vp->v_op = vops;
542 insmntque(vp, mp);
543 *vpp = vp;
544 vp->v_usecount = 1;
545 vp->v_data = 0;
If there is any other information I can provide, please let me know. Also,
since this is happening on a daily basis, I'd be happy to work closely
with a kernel guru to see if we can sort this out.