Subject: Re: locking bug in coda_lookup?
To: Greg Troxel <gdt@ir.bbn.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 09/07/2004 13:50:07
--Dxnq1zWXvFF0Q93v
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Aug 31, 2004 at 09:48:33AM -0400, Greg Troxel wrote:
> in coda.  I'm not certain which of two very similarfragments the code
> was in (don't have netbsd.gdb any more), but it was very much like
> this:
>=20
> 	    if (*ap->a_vpp) {
> 		if ((error =3D vn_lock(*ap->a_vpp, LK_EXCLUSIVE))) {
> 		    printf("coda_lookup: ");
> 		    panic("unlocked parent but couldn't lock child");
> 		}
> 	    }
>=20
> It seems that this will fail if someone else has the vnode locked, and
> there will be no retry.  From reading vn_lock, it seems that if one
> passes LK_NOWAIT, that on encountering a locked vnode, vn_lock returns
> immediately.  If LK_NOWAIT is not set, and LK_RETRY is also not set,
> it seems that vn_lock will tsleep on the vnode's v_interlock, and then
> return ENOENT instead of retrying.  The ufs code uses LK_RETRY in what
> I think is the analogous case.
>=20
> I don't understand why it makes sense to sleep and not retry the lock
> - if one isn't going to retry, what's the point of sleeping?
>=20
> So, I think that coda_lookup should pass LK_RETRY.  But I either don't
> quite or just barely understand vnode locking, so I'd appreciate
> advice here.

Sounds right. I'm not really sure why the lock manager does this, but=20
LK_RETRY sounds like what's needed.

> Also, it seems that the coda lookup operation doesn't properly handle
> IS_DOTDOT where the locking rules are different.  That could be
> related to my crash instead.

Yes, it looks like (at first glance) IS_DOTDOT isn't handled right. Though=
=20
getting this wrong won't lead to a crash, it will lead to a deadlock.

However given how the parent's unlocked then the child's locked unless
both LOCKPARENT and ISLASTCN are set, it will only matter if someone tries
to create, delete, or rename a path that ends with "..". i.e. "rm
/usr/local/bin/..".

I think the fix is to change the "(!(cnp->cn_flags & LOCKPARENT) ||=20
!(cnp->cn_flags & ISLASTCN))" clause to also unlock dvp if IS_DOTDOT is=20
set, and then after locking the child to add a relock routine to lock dvp=
=20
if IS_DOTDOT is set and we wouldn't have otherwise unlocked dvp.

I think the right conditionals are:

(!(cnp->cn_flags & LOCKPARENT) || !(cnp->cn_flags & ISLASTCN) ||=20
(cnp->cn_flags & IS_DOTDOT))

for the (existing) unlock and

((cnp->cn_flags & IS_DOTDOT) && !(!(cnp->cn_flags & LOCKPARENT) ||=20
!(cnp->cn_flags & ISLASTCN)))

for the second (new) test, the relock dvp case. Note I have not simplified=
=20
the negatives in the above test. I think the above's the same as:

((cnp->cn_flags & IS_DOTDOT) && (cnp->cn_flags & LOCKPARENT) &&=20
(cnp->cn_flags & ISLASTCN))

if I did my boolean math right.

Take care,

Bill

--Dxnq1zWXvFF0Q93v
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFBPh7/Wz+3JHUci9cRAuflAJ4r3CW7n/cLg8QsbFWFBEAQdf3fQQCdE36+
0eH1JWaT83q4ifvXMyGcadE=
=cJMa
-----END PGP SIGNATURE-----

--Dxnq1zWXvFF0Q93v--