Subject: Re: vnode locking in coda_lookup
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-kern
Date: 04/05/2007 08:44:01
--=-=-=
Content-Transfer-Encoding: quoted-printable


> Simply put, any time you have locks, you have to have a locking order.=20
> Since most lookups progress from root down, the locking order we have is=
=20
> from the root vnode down. Since ".." really is a vnode higher up, you=20
> can't lock ".." whie you have a lock on your directory as that's not=20
> locking from the top down. So you release your lock, lock "..", then=20
> re-lock ".".

Thanks - I do follow that we need to have an order to avoid deadlocks,
and what our order is.  What's not 100% clear in my head is the exact
relationship between the interlock and the vnode lock, which has to be
held for each possible operation, and mostly what can happen after vref
but without holding the vnode lock.

>>   for i in `cd /usr/bin && ls`; do cp -pf /usr/bin/$i ../$i & done
>> and=20
>>   for i in `cd .. && ls`; do cat ../$i  > /dev/null & done
>
> How many concurrent processes will this fire up? If you used to die at=20
> this and now don't, that's an improvement.

Hundreds.   I didn't save my script from before, but I think that's what
I was doing.

>> I know that wrstuden@ in the past said that LK_RETRY was not correct.
>
> I personally think it's vile for this.

Without LK_RETRY, and maybe even with, the caller has to check the
vn_lock return value and do something.  In vfs_lookup, if we fail to
lock the child, we can return an error.  But, since we have a vref at
this point, it seems this should never happen, so maybe it should be a
panic.

In the case of relocking the parent, we also have a vref (or rather our
caller must), so can that ever reasonably fail?  Should I panic if
vn_lock (w/o LK_RETRY) fails?

>> ufs_lookup does it, and I'm still unclear on this.  I do have a trusty
>> Thinkpad 600E crashbox for testing.
>
> However doing what ufs does isn't too bad; you're not making a worse mess=
.=20
> :-)

Maybe sometime someone can remove the LK_RETRY from ufs_lookup and run
filesystem tests...


>> +    /*
>> +     * XXX componentname flags in MODMASK are not handled at all */
>> +     */
>> +
>> +    /*
>> +     * The overall strategy is to switch on the lookup type and get a
>> +     * result vnode that is vref'd but not locked.  Then, the code at
>
> You NEVER grab a lock while getting the new vnode? If so, then this is ok=
.=20
> If you grab an intermediate lock, you probably should do the ISDOTDOT=20
> dance.

Yes, that's how the code goes.  It uses coda's name cache or venus, and
those come back with a ref but no locks.
>> +    /*
>> +     * Try to resolve the lookup in the minicache.  If that fails, ask
>> +     * venus to do the lookup.  XXX The interaction between vnode
>> +     * locking and coda locking to avoid deadlocks on .. lookups is
>> +     * not clear.
>
> Pick an ordering and stick with it. It doesn't matter what it is, beyond=
=20
> the fact some orderings are easier to work with than others.
>
> The same ordering as VFS is probably good, though.

I agree in general; the problem is that I don't understand how coda is
dealing with locking in the non-vfs part of the code.

>> +	    /*
>> +	     * Unless this vnode is marked CODA_NOCACHE, enter it into
>> +	     * the coda name cache to avoid a future venus round-trip.
>> +	     * XXX Interaction with componentname NOCACHE is unclear.
>
> My thought is that having either one of them set should inhibit caching.=
=20
> When is CODA_NOCACHE set?

When venus sets it before returning a cnode.

I don't understand if the coda namecache is merely duplicative of the
regular name_cache, or somehow different.

>> +	     */
>>  	    if (!(vtype & CODA_NOCACHE))
>>  		coda_nc_enter(VTOC(dvp), nm, len, cred, VTOC(*vpp));
>>  	}
>> @@ -978,6 +996,11 @@ coda_lookup(void *v)
>>  	cnp->cn_flags |=3D SAVENAME;
>>  	*ap->a_vpp =3D NULL;
>>      }
>> +    /*
>> +     * XXX If creating and we found something, does that need to be an
>> +     * error?
>> +     */
>
> Not necessarily. All of the CREATE/RENAME/DELETE stuff is an optimization=
.=20
> The idea is that if we're creating, we have to look through the directory=
=20
> to see if the name already exits, and we also have to look to see where=20
> there's a free slot to insert the new name. Adding CREATE is all about=20
> compacting that into one check. We keep the parent directory locked so=20
> that no one can use the empty slot we found before we do.
>
> Likewise RENAME and DELETE are optimizations for those ops. If we're abou=
t=20
> to rename something, it makes no sense to add it to the name cache.
>
> Also, Chuq changed how all of this works recently. I'm not 100% on the ne=
w=20
> stuff, but he simplified it all.

OK, will let this be for now, and drop the comment.  I'm adding XXX to
check this behavior againsat the new simplified rules.

>>      /*
>>       * If we are removing, and we are at the last element, and we
>> @@ -997,17 +1020,25 @@ coda_lookup(void *v)
>>      }
>>=20=20
>>      /*
>> -     * If the lookup went well, we need to lock the child.
>> +     * If the lookup succeeded, we must generally lock the returned
>> +     * vnode.  This could be a ., .., or normal lookup.  See
>> +     * vnodeops(9) for the details.
>
> I really think you need to start this dance sooner.

Because it's unsafe to be holding just a vref without a lock?  Or
because you think this might deadlock?  Or something else?
(Sorry, I'm really beyond my level of solid understanding here.)

> Looks better though!!

OK, I will commit this, and work from there.  The old code is clearly
wrong, and this works.

Thanks for the review, comments and tutorial.

Is anyone else running coda?  If so I'd appreciate hearing from you.



--=-=-=
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)

iD8DBQFGFO8R+vesoDJhHiURAnweAJ0Rk7RxTu0DJR4AUhSPp3zRqgr0VgCgiXbN
TgDmaRrAMEhNbelRLXdwj+Q=
=s2ft
-----END PGP SIGNATURE-----
--=-=-=--