Subject: Re: kern/32090: uvm_fault() after "vnode: table is full"
To: None <gnats-bugs@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: netbsd-bugs
Date: 11/18/2005 08:41:28
there are several other PRs about crashes after "vnode: table is full",
and all of them also involve a layered file system:

28705	2.0 kernel panic in layer_unlock with "vnode: table is full"
31979	panic: lockmgr: release of unlocked lock (layerfs)
29670	"release of unlocked lock" panic with null fs

the last one added a tangentially-related improvement (falling back to
reclaiming a vnode from the hold list in the case that there are vnodes
on the free list but they're all busy).  this makes it much less likely
that this problem would be seen, but it doesn't fix the bug responsible
for the crash.  this change should be pulled up to the 2.x branches,
though.

in this dump, the vnode being processed by nfs_inactive() is actually an
FFS vnode, so most likely it has been reused while this thread was sleeping.
the vnode is not locked, even though it should be while VOP_INACTIVE() is
still in process.  vrele() puts the vnode on the freelist before calling
VOP_INACTIVE(), so the vnode being incorrectly unlocked while this thread
was sleeping would also allow the vnode to be reused too early like this.

there are some related fixes in revs 1.19 and 1.21 of layer_vnops.c:

----------------------------
revision 1.21
date: 2004/06/16 17:59:53;  author: wrstuden;  state: Exp;  lines: +5 -5
Make sure we actually locked the parent vnode before we clear
PDIRUNLOCK. The whole reason we have the flag is to note (rare)
cases where we are supposed to have the parent directory locked
but don't. Permits error handling code to know what to do with
the parrent vnode (vrele() vs vput()).
----------------------------
...
----------------------------
revision 1.19
date: 2004/06/16 12:37:01;  author: yamt;  state: Exp;  lines: +14 -3
missing error recover from layer_node_create failure.
----------------------------

both of these changes should also be pulled up to 2.x.
it doesn't look to me that these changes fix the problem in these PRs, though.
without these changes, the symptom would have been vnodes left locked when
they should have been be unlocked, which is the opposite of what we're seeing
here.

-Chuck