tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: vget() patch
On Wed, Oct 28, 2009 at 06:57:36PM +0100, Manuel Bouyer wrote:
> Hi,
> unless someone objects I'll commit the attached patch and request
> pullup to netbsd-5 next week-end. This is he same as the one posted in the
> "panic due to lost vnode flag" thread.
> Althoug it doesn't fix this special issue, it seems to fix
> kern/41147 in a better way, and I also hope kern/42205.
> I've been running with this patch on several servers (and the previous
> fix for kern/41147 commented out as in the patch below) without
> problems; this is also being tested by others.
This patch cause a deadlock with layered fs:
the vrele kernel thread can sleep in vrelel() getting the vnode lock.
While it does this another thread may sleep in vget() waiting for
VI_INACTNOW to clear (well, not exactly, but it's looping on vget(NOWAIT)
in layer_node_find() which leads to the same result) and if this
is a vnode in a upper layer of a stacked FS, this thread may hold
the vnode lock because it's the same as the vnode lock from the
corresponding vnode in the lower layer (i.e. in layer_node_find(),
lowervp and the vp we call vget with share the same lock).
So the vrele thread will never get the vnode lock to VOP_INACTIVE
it and the loop in always restart because vget() fails because vp
is in VI_INACTNOW state.
A similar deadlock could happen in layer_node_find() with VI_XLOCK,
it's avoided by skipping vnode with VI_XLOCK set. We can't do the
same for VI_INACTNOW because when VI_INACTNOW is set we don't know
if the vnode will really be cleaned or not. Skipping VI_INACTNOW
vnode would lead to duplicate entries in the layer mount hash.
I think a better way to handle this is to not sleep with VI_INACTNOW
set in vrelel(), this is what the attached patch does.
Before trying to get the vnode lock (when allowed to sleep, in the
LK_NOWAIT case it's not an issue), we drop VI_INACTNOW.
Once we got the lock, we set VI_INACTNOW again and test that
we did not gain another reference while we were slepping. If we
did, we drop VI_INACTNOW, drop the vnode lock and exit from vrelel().
If we didn't gain a reference all is well, we set VI_INACTNOW again and
go on VOP_INACTIVEing the vnode.
Comments ?
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Index: vfs_subr.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
retrieving revision 1.357.4.7
diff -u -p -u -r1.357.4.7 vfs_subr.c
--- vfs_subr.c 21 Nov 2009 20:07:49 -0000 1.357.4.7
+++ vfs_subr.c 27 Nov 2009 18:57:41 -0000
@@ -1411,14 +1411,30 @@ vrelel(vnode_t *vp, int flags)
/* The pagedaemon can't wait around; defer. */
defer = true;
} else if (curlwp == vrele_lwp) {
- /* We have to try harder. */
- vp->v_iflag &= ~VI_INACTREDO;
+ /*
+ * We have to try harder. But we can't sleep
+ * with VI_INACTNOW as vget() may be waiting on it.
+ */
+ vp->v_iflag &= ~(VI_INACTREDO|VI_INACTNOW);
+ cv_broadcast(&vp->v_cv);
error = vn_lock(vp, LK_EXCLUSIVE | LK_INTERLOCK |
LK_RETRY);
if (error != 0) {
/* XXX */
vpanic(vp, "vrele: unable to lock %p");
}
+ mutex_enter(&vp->v_interlock);
+ /*
+ * if we did get another reference while
+ * sleeping, don't try to inactivate it yet.
+ */
+ if (__predict_false(vtryrele(vp))) {
+ VOP_UNLOCK(vp, 0);
+ mutex_exit(&vp->v_interlock);
+ return;
+ }
+ vp->v_iflag |= VI_INACTNOW;
+ mutex_exit(&vp->v_interlock);
defer = false;
} else if ((vp->v_iflag & VI_LAYER) != 0) {
/*
Home |
Main Index |
Thread Index |
Old Index