Subject: Re: kern/32090: uvm_fault() after "vnode: table is full"
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Chuck Silvers <chuq@chuq.com>
List: netbsd-bugs
Date: 11/18/2005 16:42:01
The following reply was made to PR kern/32090; it has been noted by GNATS.

From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/32090: uvm_fault() after "vnode: table is full"
Date: Fri, 18 Nov 2005 08:41:28 -0800

 there are several other PRs about crashes after "vnode: table is full",
 and all of them also involve a layered file system:
 
 28705	2.0 kernel panic in layer_unlock with "vnode: table is full"
 31979	panic: lockmgr: release of unlocked lock (layerfs)
 29670	"release of unlocked lock" panic with null fs
 
 the last one added a tangentially-related improvement (falling back to
 reclaiming a vnode from the hold list in the case that there are vnodes
 on the free list but they're all busy).  this makes it much less likely
 that this problem would be seen, but it doesn't fix the bug responsible
 for the crash.  this change should be pulled up to the 2.x branches,
 though.
 
 in this dump, the vnode being processed by nfs_inactive() is actually an
 FFS vnode, so most likely it has been reused while this thread was sleeping.
 the vnode is not locked, even though it should be while VOP_INACTIVE() is
 still in process.  vrele() puts the vnode on the freelist before calling
 VOP_INACTIVE(), so the vnode being incorrectly unlocked while this thread
 was sleeping would also allow the vnode to be reused too early like this.
 
 there are some related fixes in revs 1.19 and 1.21 of layer_vnops.c:
 
 ----------------------------
 revision 1.21
 date: 2004/06/16 17:59:53;  author: wrstuden;  state: Exp;  lines: +5 -5
 Make sure we actually locked the parent vnode before we clear
 PDIRUNLOCK. The whole reason we have the flag is to note (rare)
 cases where we are supposed to have the parent directory locked
 but don't. Permits error handling code to know what to do with
 the parrent vnode (vrele() vs vput()).
 ----------------------------
 ...
 ----------------------------
 revision 1.19
 date: 2004/06/16 12:37:01;  author: yamt;  state: Exp;  lines: +14 -3
 missing error recover from layer_node_create failure.
 ----------------------------
 
 both of these changes should also be pulled up to 2.x.
 it doesn't look to me that these changes fix the problem in these PRs, though.
 without these changes, the symptom would have been vnodes left locked when
 they should have been be unlocked, which is the opposite of what we're seeing
 here.
 
 -Chuck