Subject: Re: reboot problems unmounting root
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Antti Kantee <pooka@cs.hut.fi>
List: tech-kern
Date: 07/05/2007 23:50:38
On Thu Jul 05 2007 at 13:14:54 -0700, Bill Stouder-Studenmund wrote:
> > Yes.  But the only reason it explodes seems to be the lock pointer
> > exported from the vnode.  I, however, do maintain that you should revoke
> > all vnodes down from the upmost you revoked.  You just need to make the
> > one above know that it's a deadfs situation.  Actually, that's what the
> > code does now.  It just doesn't revoke any layerfs vnodes.
> 
> Why? What's the functional difference (assuming we fix this issue) between 
> a layer node accessing a dead vnode and a vnode that was a layer vnode 
> that now is a dead vnode? You get errors for all operations anyway.
> 
> I don't see one. And since we have to be able to handle a stack where 
> there are multiple mounts above one, we have to deal w/ "unexpected" 
> revoking of the underlying vnode. To do that means handling the case of a 
> live vnode on top of a dead one, and "handling" means returning buckets of 
> errors and not crashing the kernel.

Hmm, I thought had a very good reasoning for that, but I think I lost it.
Maybe I misreasoned.

Anyway, the CURRENT state is that ONLY the lower vnode is being revoked
because of layer_bypass().  The upper is kind of implicitly getting
revoked.  Maybe that should be changed to revoke only the upper one.

> Huh? I was talking about it being in the free list. vnodes on the free 
> list do NOT have VXLOCK set. :-)
> 
> Also, we don't leave VXLOCK there forever. :-)

I was talking about VOP_RECLAIM.  That's where we want to free the lock
if at all.  But n/m that part ;)

> > I don't see how the forced unmount situation is any different from a
> > revoke and therefore why revoke would need a special flag.
> 
> A forced unmount shouldn't be doing the same thing as a revoke. It should 
> just revoke the at-level vnode. The difference being force-unmounting a 
> layer shouldn't blast away the underlying vnodes.

Well, revoke is "check for device aliases + reclaim regardless of
use count".  A forced unmount is "reclaim regardless of use count".
I was just talking from the perspective of reclaim, once again.

> Revoke is different from unmount _if_ the leaf fs does things differently
> depending on if someone's still using the vnode or not. If it doesn't
> differentiate, then no biggie and no parameter. If however it does
> differentiate, the reason we need a flag is to tell the leaf fs that its
> use count lies.
> 
> The reason for the flag is to help the leaf fs. How exactly does the leaf
> fs know if there are users other than the revoker on the vnode? It looks
> at the use count and compares it to one. However if the VOP_REVOKE() came
> through a layered file system, that test isn't good. The one reference
> could well be the one held by a single upper layer, which gives you no
> indication of how may things hold the upper layer's vnode.
> 
> That's why I thought it'd matter. If however whatever leaf file systems do 
> doesn't care (works right either way), then we don't need said flag.

Ah, ic.

But now remind me why the revoke should be coming down at all?

> > The call sequence is approximately the following:
> > 
> > VOP_REVOKE
> > VOP_UNLOCK (vclean() doesn't call inactive because the vnode is still active)
> > VOP_RECLAIM
> > v_op set to deadfs
> > upper layer decides this one is not worth holding on to
> > vrele()
> > VOP_INACTIVE (for deadfs)
> > new start for the vnode
> 
> It's not clear here what's calling what.

VOP_REVOKE (generally) -> vgonel -> vclean -> (VOP_UNLOCK, VOP_RECLAIM,
sets v_op to deadfs)

another party: vrele() -> (VOP_INACTIVE(), put vnode on freelist)

> Also, unless we short-circuit it, there should be a reclaim when we 
> recycle the vnode.

No, there won't (technically).  The vnode now uses deadops and
dead_reclaim is nada.

> > Where do you want to free the lock structure?  We will never get another
> > VOP_RECLAIM for the original file system method.  Do we want to do it in
> > deadfs VOP_INACTIVE?  That's ugly, but it's the only place I see where to
> > do it.  It also needs an accompanying bit to say "hey, we already nuked
> > the lock in the file system reclaim op" in case the vnode is NOT revoked.
> 
> The lock structure is part of struct vnode, so we never have to "free" it. 
> We decomission (LK_DRAIN) it as part of reclaiming, but we can un-drain it 
> I believe. If we can't, we should figure out a way to.

free, drain, conceptually the same

> Something for deadfs VOP_INACTIVE would be good, namely to do something to 
> indicate "recycle me soon." Hmm, wait, that's a feature we haven't stollen 
> from Darwin yet. :-)

But dead_inactive is called only for vnodes which were forcibly reclaimed.
Others have VOP_INACTIVE called already when they are still attached to
their real file system.  I all inactive methods would need to be patched
to cope with this, and that add yet more difficult-to-comprehend side
effects to file system writing.

> > I'd prefer another approach where the whole layering lock export system
> > was redone instead of redoing reclaim to fix the layer lock exporting.
> > But I don't know how to fix layerfs locking ...
> 
> Do we care about the inerlock? The advantage of the current locking 
> behavior is that, since each layer has access to the lockmgr lock, we can 
> pass in the right interlock. If we either don't care, or we're fine with a 
> lockmgr call on vnode D releasing the interlock on vnode A (because 
> someone tried to lock A with its interlock held, and A is above B, which 
> is above C, which is above D), then we can just turn it all into recursive 
> VOP_LOCK() calls on the underlying vnodes.
> 
> Have you read Heidemann's disseration? The locking export is an attempt at 
> the shared-state stack stuff he did.

No.  I guess that's saying "I should" ;)

> We may just need to give up on the interlock, as there's not really any 
> way we can guarantee it for something like unionfs.

I actually had a similar idea and did a two-layer hack (it doesn't
recurse to the bottom of the stack, but that's easy).  It seems to fix
the problems.  I can't see it being any worse than the current state of
the art.


Index: kern/vfs_subr.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
retrieving revision 1.288
diff -u -p -r1.288 vfs_subr.c
--- kern/vfs_subr.c	5 Jun 2007 12:31:31 -0000	1.288
+++ kern/vfs_subr.c	5 Jul 2007 20:01:06 -0000
@@ -1649,6 +1649,7 @@ vclean(struct vnode *vp, int flags, stru
 	 */
 	vp->v_op = dead_vnodeop_p;
 	vp->v_tag = VT_NON;
+	vp->v_vnlock = NULL;
 	simple_lock(&vp->v_interlock);
 	VN_KNOTE(vp, NOTE_REVOKE);	/* FreeBSD has this in vn_pollgone() */
 	vp->v_flag &= ~(VXLOCK|VLOCKSWORK);
Index: miscfs/genfs/layer_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/miscfs/genfs/layer_vnops.c,v
retrieving revision 1.31
diff -u -p -r1.31 layer_vnops.c
--- miscfs/genfs/layer_vnops.c	16 Apr 2007 08:10:58 -0000	1.31
+++ miscfs/genfs/layer_vnops.c	5 Jul 2007 20:01:06 -0000
@@ -605,10 +605,11 @@ layer_lock(v)
 		int a_flags;
 		struct proc *a_p;
 	} */ *ap = v;
-	struct vnode *vp = ap->a_vp, *lowervp;
+	struct vnode *vp = ap->a_vp;
+	struct vnode *lowervp = LAYERVPTOLOWERVP(vp);
 	int	flags = ap->a_flags, error;
 
-	if (vp->v_vnlock != NULL) {
+	if (lowervp->v_vnlock != NULL) {
 		/*
 		 * The lower level has exported a struct lock to us. Use
 		 * it so that all vnodes in the stack lock and unlock
@@ -635,7 +637,6 @@ layer_lock(v)
 		 * first). But we can LK_DRAIN the upper lock as a step
 		 * towards decomissioning it.
 		 */
-		lowervp = LAYERVPTOLOWERVP(vp);
 		if (flags & LK_INTERLOCK) {
 			simple_unlock(&vp->v_interlock);
 			flags &= ~LK_INTERLOCK;
@@ -666,9 +667,10 @@ layer_unlock(v)
 		struct proc *a_p;
 	} */ *ap = v;
 	struct vnode *vp = ap->a_vp;
+	struct vnode *lowervp = LAYERVPTOLOWERVP(vp);
 	int	flags = ap->a_flags;
 
-	if (vp->v_vnlock != NULL) {
+	if (lowervp->v_vnlock != NULL) {
 		return (lockmgr(vp->v_vnlock, ap->a_flags | LK_RELEASE,
 			&vp->v_interlock));
 	} else {
@@ -676,7 +678,7 @@ layer_unlock(v)
 			simple_unlock(&vp->v_interlock);
 			flags &= ~LK_INTERLOCK;
 		}
-		VOP_UNLOCK(LAYERVPTOLOWERVP(vp), flags);
+		VOP_UNLOCK(lowervp, flags);
 		return (lockmgr(&vp->v_lock, flags | LK_RELEASE,
 			&vp->v_interlock));
 	}

-- 
Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
http://www.iki.fi/pooka/                          http://www.NetBSD.org/
    "la qualité la plus indispensable du cuisinier est l'exactitude"