Subject: Re: pr/35143 and layer_node_find()
To: Bill Studenmund <wrstuden@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 12/09/2006 06:40:49
--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Wed, Dec 06, 2006 at 09:59:32PM -0800, Bill Studenmund wrote:
> > the attached patch fixes this problem, and with both this patch and
> > my other patch for the vrele() issue applied, a stress test survives
> > much longer than it previously took to trigger a hang.
> 
> What eventually kills it?

the test was still running when I sent the previous mail,
but I'm told it got anther vnlock deadlock yesterday.
(I'm not doing this testing myself.)  alas we didn't get a dump.
it ran for about three days before running into the latest problem.


> > any objection to going with this approach?
> 
> We DEFINITELY need this patch. How exactly did you figure this out? :-)

I had the tester apply the attached debug code and then looked at
the resulting dump.


> I still am uncomfortable with us calling vget() without LK_NOWAIT as it 
> seems like a deadlock waiting to happen, protected by the race in the 
> getnewvnode() test. Also, if we had a better way to recover from the 
> reaping case, we wouldn't need the test in getnewvnode(). :-)

I'd also like to remove the check in getnewvnode(), but I still
don't like having multiple vnodes with the same identity.

I'd like to think about other ways to deal with this.
the underlying problem is a combination of several factors:
 - the same "VOP_LOCK" is used to protect both vnode identity and
   vnode data contents (ie. directory entries, file data, etc).
 - the "VOP_LOCK" is shared between layers but the "XLOCK" isn't,
   thus the locking for vnode identity is partially shared by
   layered file systems and partially not shared.
 - this partial sharing results in vget() wanting to examine the
   identity locks (XLOCK and VOP_LOCK) in the wrong order, since a thread
   will already hold the VOP_LOCK when it wants to check for (and possibly
   sleep for) XLOCK.

I'd like to look into separating the locks for vnode identity vs.
vnode contents, so that the locking for vnode identity is never shared
between vnodes.


> However I agree totally with your patch on everything except this one 
> point, so it's probably better to get this into 4.0 and keep thinking than 
> to not.

ok, I'll go ahead and check in my stuff.


> You earlier sounded like you didn't like my thought (due to the 
> multi-instance aspect) but that you didn't object to it. Would you mind me 
> making that change after you check all this in (and after I get your 
> stress tool & run it)?

if there were any evidence that your change would fix an actual problem
that we couldn't find some better way to fix for 4.0, then I'd be ok with it.
but since there's no evidence of that at this point I'd rather not apply it.

-Chuck

--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="diff.nosleep"

Index: kern/kern_synch.c
===================================================================
RCS file: /home/chs/netbsd/cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.173
diff -u -p -r1.173 kern_synch.c
--- kern/kern_synch.c	3 Nov 2006 20:46:00 -0000	1.173
+++ kern/kern_synch.c	3 Dec 2006 16:19:12 -0000
@@ -932,6 +932,12 @@ mi_switch(struct lwp *l, struct lwp *new
 	struct proc *p = l->l_proc;
 	int retval;
 
+static uintptr_t chuq_sc;
+if (l->l_flag & L_NOSLEEP) {
+	panic("chuq: sleeping when we shouldn't\n");
+}
+l->l_emuldata = (void *)chuq_sc++;
+
 	SCHED_ASSERT_LOCKED();
 
 	/*
Index: kern/vfs_subr.c
===================================================================
RCS file: /home/chs/netbsd/cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.276
diff -u -p -r1.276 vfs_subr.c
--- kern/vfs_subr.c	17 Nov 2006 17:05:18 -0000	1.276
+++ kern/vfs_subr.c	3 Dec 2006 19:00:32 -0000
@@ -268,10 +268,14 @@ try_nextlist:
 	simple_unlock(&vnode_free_list_slock);
 	vp->v_lease = NULL;
 
+if (vp->v_flag & VLAYER) {
+	l->l_flag |= L_NOSLEEP;
+}
 	if (vp->v_type != VBAD)
 		vgonel(vp, l);
 	else
 		simple_unlock(&vp->v_interlock);
+l->l_flag &= ~L_NOSLEEP;
 	vn_finished_write(mp, 0);
 #ifdef DIAGNOSTIC
 	if (vp->v_data || vp->v_uobj.uo_npages ||
@@ -1542,6 +1546,8 @@ vclean(struct vnode *vp, int flags, stru
 	 */
 	VOP_LOCK(vp, LK_DRAIN | LK_INTERLOCK);
 
+l->l_flag &= ~L_NOSLEEP;
+
 	/*
 	 * Clean out any cached data associated with the vnode.
 	 * If special device, remove it from special device alias list.
Index: sys/lwp.h
===================================================================
RCS file: /home/chs/netbsd/cvs/src/sys/sys/lwp.h,v
retrieving revision 1.46
diff -u -p -r1.46 lwp.h
--- sys/lwp.h	24 Oct 2006 10:05:45 -0000	1.46
+++ sys/lwp.h	3 Dec 2006 16:14:28 -0000
@@ -132,6 +132,8 @@ extern struct lwp lwp0;			/* LWP for pro
 #define	L_COWINPROGRESS	0x40000000 /* UFS: doing copy on write */
 #define	L_SA_SWITCHING	0x80000000 /* SA LWP in context switch */
 
+#define	L_NOSLEEP	0x00100000
+
 /*
  * Status values.
  *

--u3/rZRmxL6MmkK24--