Subject: Re: locking bug in coda_lookup?
To: Greg Troxel <gdt@ir.bbn.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 09/07/2004 13:50:07
--Dxnq1zWXvFF0Q93v
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Aug 31, 2004 at 09:48:33AM -0400, Greg Troxel wrote:
> in coda. I'm not certain which of two very similarfragments the code
> was in (don't have netbsd.gdb any more), but it was very much like
> this:
>=20
> if (*ap->a_vpp) {
> if ((error =3D vn_lock(*ap->a_vpp, LK_EXCLUSIVE))) {
> printf("coda_lookup: ");
> panic("unlocked parent but couldn't lock child");
> }
> }
>=20
> It seems that this will fail if someone else has the vnode locked, and
> there will be no retry. From reading vn_lock, it seems that if one
> passes LK_NOWAIT, that on encountering a locked vnode, vn_lock returns
> immediately. If LK_NOWAIT is not set, and LK_RETRY is also not set,
> it seems that vn_lock will tsleep on the vnode's v_interlock, and then
> return ENOENT instead of retrying. The ufs code uses LK_RETRY in what
> I think is the analogous case.
>=20
> I don't understand why it makes sense to sleep and not retry the lock
> - if one isn't going to retry, what's the point of sleeping?
>=20
> So, I think that coda_lookup should pass LK_RETRY. But I either don't
> quite or just barely understand vnode locking, so I'd appreciate
> advice here.
Sounds right. I'm not really sure why the lock manager does this, but=20
LK_RETRY sounds like what's needed.
> Also, it seems that the coda lookup operation doesn't properly handle
> IS_DOTDOT where the locking rules are different. That could be
> related to my crash instead.
Yes, it looks like (at first glance) IS_DOTDOT isn't handled right. Though=
=20
getting this wrong won't lead to a crash, it will lead to a deadlock.
However given how the parent's unlocked then the child's locked unless
both LOCKPARENT and ISLASTCN are set, it will only matter if someone tries
to create, delete, or rename a path that ends with "..". i.e. "rm
/usr/local/bin/..".
I think the fix is to change the "(!(cnp->cn_flags & LOCKPARENT) ||=20
!(cnp->cn_flags & ISLASTCN))" clause to also unlock dvp if IS_DOTDOT is=20
set, and then after locking the child to add a relock routine to lock dvp=
=20
if IS_DOTDOT is set and we wouldn't have otherwise unlocked dvp.
I think the right conditionals are:
(!(cnp->cn_flags & LOCKPARENT) || !(cnp->cn_flags & ISLASTCN) ||=20
(cnp->cn_flags & IS_DOTDOT))
for the (existing) unlock and
((cnp->cn_flags & IS_DOTDOT) && !(!(cnp->cn_flags & LOCKPARENT) ||=20
!(cnp->cn_flags & ISLASTCN)))
for the second (new) test, the relock dvp case. Note I have not simplified=
=20
the negatives in the above test. I think the above's the same as:
((cnp->cn_flags & IS_DOTDOT) && (cnp->cn_flags & LOCKPARENT) &&=20
(cnp->cn_flags & ISLASTCN))
if I did my boolean math right.
Take care,
Bill
--Dxnq1zWXvFF0Q93v
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
iD8DBQFBPh7/Wz+3JHUci9cRAuflAJ4r3CW7n/cLg8QsbFWFBEAQdf3fQQCdE36+
0eH1JWaT83q4ifvXMyGcadE=
=cJMa
-----END PGP SIGNATURE-----
--Dxnq1zWXvFF0Q93v--