Subject: Re: locking bug in coda_lookup?
To: Greg Troxel <>
From: Bill Studenmund <>
List: tech-kern
Date: 09/07/2004 13:50:07
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Aug 31, 2004 at 09:48:33AM -0400, Greg Troxel wrote:
> in coda.  I'm not certain which of two very similarfragments the code
> was in (don't have netbsd.gdb any more), but it was very much like
> this:
> 	    if (*ap->a_vpp) {
> 		if ((error =3D vn_lock(*ap->a_vpp, LK_EXCLUSIVE))) {
> 		    printf("coda_lookup: ");
> 		    panic("unlocked parent but couldn't lock child");
> 		}
> 	    }
> It seems that this will fail if someone else has the vnode locked, and
> there will be no retry.  From reading vn_lock, it seems that if one
> passes LK_NOWAIT, that on encountering a locked vnode, vn_lock returns
> immediately.  If LK_NOWAIT is not set, and LK_RETRY is also not set,
> it seems that vn_lock will tsleep on the vnode's v_interlock, and then
> return ENOENT instead of retrying.  The ufs code uses LK_RETRY in what
> I think is the analogous case.
> I don't understand why it makes sense to sleep and not retry the lock
> - if one isn't going to retry, what's the point of sleeping?
> So, I think that coda_lookup should pass LK_RETRY.  But I either don't
> quite or just barely understand vnode locking, so I'd appreciate
> advice here.

Sounds right. I'm not really sure why the lock manager does this, but=20
LK_RETRY sounds like what's needed.

> Also, it seems that the coda lookup operation doesn't properly handle
> IS_DOTDOT where the locking rules are different.  That could be
> related to my crash instead.

Yes, it looks like (at first glance) IS_DOTDOT isn't handled right. Though=
getting this wrong won't lead to a crash, it will lead to a deadlock.

However given how the parent's unlocked then the child's locked unless
both LOCKPARENT and ISLASTCN are set, it will only matter if someone tries
to create, delete, or rename a path that ends with "..". i.e. "rm

I think the fix is to change the "(!(cnp->cn_flags & LOCKPARENT) ||=20
!(cnp->cn_flags & ISLASTCN))" clause to also unlock dvp if IS_DOTDOT is=20
set, and then after locking the child to add a relock routine to lock dvp=
if IS_DOTDOT is set and we wouldn't have otherwise unlocked dvp.

I think the right conditionals are:

(!(cnp->cn_flags & LOCKPARENT) || !(cnp->cn_flags & ISLASTCN) ||=20
(cnp->cn_flags & IS_DOTDOT))

for the (existing) unlock and

((cnp->cn_flags & IS_DOTDOT) && !(!(cnp->cn_flags & LOCKPARENT) ||=20
!(cnp->cn_flags & ISLASTCN)))

for the second (new) test, the relock dvp case. Note I have not simplified=
the negatives in the above test. I think the above's the same as:

((cnp->cn_flags & IS_DOTDOT) && (cnp->cn_flags & LOCKPARENT) &&=20
(cnp->cn_flags & ISLASTCN))

if I did my boolean math right.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.3 (NetBSD)