Subject: Re: nfsd: locking botch in op %d
To: None <tech-kern@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 03/13/2001 14:07:10
>>> In the case of an aliased device node, ufs_vinit calls vput() on
>>> the old vnode just before initializing the new one.
>> [...] ufs_vinit, before it vput()s the old vnode, bashes the
>> vnodeops field to specfs's vnodeops.  And specfs's unlock routine is
>> genfs_nounlock, which doesn't actually do anything.  This means that
>> the VOP_UNLOCK in vput() is a no-op.
> Oops!

:-)

Actually, I can't help wondering how this *ever* worked.  There was a
time when that diskless machine worked fine with my NFS server, and I
haven't changed anything since then that I can see affecting this.

> Could you try changing the genfs_no{,is,un}lock{,ed} calls into the
> real-lock varieties and see what happens?

I didn't try that, since I don't know what else uses those routines,
and some uses of them may depend on their semantics.  (If you really
want me to, I can try that, but it wouldn't surprise me if it broke
something else, something that depends on the genfs_no* behavior being
what the name implies.)

What I did do, and it seems to have made the problem go away, is

--- /sources/latest-usr-src/sys/ufs/ufs/ufs_vnops.c	Tue Mar  7 18:19:42 2000
+++ ufs_vnops.c	Tue Mar 13 01:00:13 2001
@@ -1917,6 +1917,9 @@
 			 */
 			nvp->v_data = vp->v_data;
 			vp->v_data = NULL;
+			/* With v_op bashed, vput's VOP_UNLOCK is a noop.
+			   But at this point vp is locked, so.... */
+			VOP_UNLOCK(vp,0);
 			vp->v_op = spec_vnodeop_p;
 			vput(vp);
 			vgone(vp);

I can't pretend to believe that this is the right fix.  But it made my
symptoms go away; if it does likewise for the other person who had the
problem, I'd be inclined to say it is the locking problem I outlined.
What the proper fix is, that question I'll leave up to those who
actually understand the code.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B