Subject: Re: vnode locking in coda_lookup
To: Greg Troxel <gdt@ir.bbn.com>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 04/04/2007 18:13:48
--2oS5YaxWCcQjTEyO
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Wed, Apr 04, 2007 at 07:38:14PM -0400, Greg Troxel wrote:
> I have attempted to fix vnode locking in coda's lookup routine.
> Basically, coda_lookup (in sys/coda/coda_vnops.c) never followed the
> special rules for IS_DOTDOT. I've read the rules, and looked at
> ufs_lookup, but I'm not sure this is right. It does survive=20
Simply put, any time you have locks, you have to have a locking order.=20
Since most lookups progress from root down, the locking order we have is=20
from the root vnode down. Since ".." really is a vnode higher up, you=20
can't lock ".." whie you have a lock on your directory as that's not=20
locking from the top down. So you release your lock, lock "..", then=20
re-lock ".".
> for i in `cd /usr/bin && ls`; do cp -pf /usr/bin/$i ../$i & done
> and=20
> for i in `cd .. && ls`; do cat ../$i > /dev/null & done
How many concurrent processes will this fire up? If you used to die at=20
this and now don't, that's an improvement.
> I have also cleaned up and rewritten comments, some of which probably
> date from Mach days.
>=20
> I know that wrstuden@ in the past said that LK_RETRY was not correct.
I personally think it's vile for this.
> ufs_lookup does it, and I'm still unclear on this. I do have a trusty
> Thinkpad 600E crashbox for testing.
However doing what ufs does isn't too bad; you're not making a worse mess.=
=20
:-)
> So two questions:
>=20
> is this correct?
>=20
> is this clearly better than what was there?
>=20
>=20
>=20
> --- coda_vnops.c.~1.52.~ 2007-03-12 14:44:43.000000000 -0400
> +++ coda_vnops.c 2007-04-04 19:26:49.000000000 -0400
> @@ -885,22 +885,20 @@ coda_lookup(void *v)
> {
> /* true args */
> struct vop_lookup_args *ap =3D v;
> + /* (locked) vnode of dir in which to do lookup */
> struct vnode *dvp =3D ap->a_dvp;
> struct cnode *dcp =3D VTOC(dvp);
> + /* output variable for result */
> struct vnode **vpp =3D ap->a_vpp;
> - /*
> - * It looks as though ap->a_cnp->ni_cnd->cn_nameptr holds the rest
> - * of the string to xlate, and that we must try to get at least
> - * ap->a_cnp->ni_cnd->cn_namelen of those characters to macth. I
> - * could be wrong.
> - */
> - struct componentname *cnp =3D ap->a_cnp;
> + /* name to lookup */
> + struct componentname *cnp =3D ap->a_cnp;
> kauth_cred_t cred =3D cnp->cn_cred;
> struct lwp *l =3D cnp->cn_lwp;
> /* locals */
> struct cnode *cp;
> const char *nm =3D cnp->cn_nameptr;
> int len =3D cnp->cn_namelen;
> + int flags =3D cnp->cn_flags;
> CodaFid VFid;
> int vtype;
> int error =3D 0;
> @@ -910,6 +908,17 @@ coda_lookup(void *v)
> CODADEBUG(CODA_LOOKUP, myprintf(("lookup: %s in %s\n",
> nm, coda_f2s(&dcp->c_fid))););
> =20
> + /*
> + * XXX componentname flags in MODMASK are not handled at all */
> + */
> +
> + /*
> + * The overall strategy is to switch on the lookup type and get a
> + * result vnode that is vref'd but not locked. Then, the code at
You NEVER grab a lock while getting the new vnode? If so, then this is ok.=
=20
If you grab an intermediate lock, you probably should do the ISDOTDOT=20
dance.
> + * exit: switches on ., .., and regular lookups and does the right
> + * locking.
> + */
> +
> /* Check for lookup of control object. */
> if (IS_CTL_NAME(dvp, nm, len)) {
> *vpp =3D coda_ctlvp;
> @@ -918,6 +927,7 @@ coda_lookup(void *v)
> goto exit;
> }
> =20
> + /* Avoid trying to hand venus an unreasonably long name. */
> if (len+1 > CODA_MAXNAMLEN) {
> MARK_INT_FAIL(CODA_LOOKUP_STATS);
> CODADEBUG(CODA_LOOKUP, myprintf(("name too long: lookup, %s (%s)\n",
> @@ -926,8 +936,13 @@ coda_lookup(void *v)
> error =3D EINVAL;
> goto exit;
> }
> - /* First try to look the file up in the cfs name cache */
> - /* lock the parent vnode? */
> +
> + /*
> + * Try to resolve the lookup in the minicache. If that fails, ask
> + * venus to do the lookup. XXX The interaction between vnode
> + * locking and coda locking to avoid deadlocks on .. lookups is
> + * not clear.
Pick an ordering and stick with it. It doesn't matter what it is, beyond=20
the fact some orderings are easier to work with than others.
The same ordering as VFS is probably good, though.
> + */
> cp =3D coda_nc_lookup(dcp, nm, len, cred);
> if (cp) {
> *vpp =3D CTOV(cp);
> @@ -935,8 +950,7 @@ coda_lookup(void *v)
> CODADEBUG(CODA_LOOKUP,
> myprintf(("lookup result %d vpp %p\n",error,*vpp));)
> } else {
> -
> - /* The name wasn't cached, so we need to contact Venus */
> + /* The name wasn't cached, so ask Venus. */
> error =3D venus_lookup(vtomi(dvp), &dcp->c_fid, nm, len, cred, l, &VFid=
, &vtype);
> =20
> if (error) {
> @@ -952,9 +966,13 @@ coda_lookup(void *v)
> =20
> cp =3D make_coda_node(&VFid, dvp->v_mount, vtype);
> *vpp =3D CTOV(cp);
> + /* vpp is now vrefed. */
> =20
> - /* enter the new vnode in the Name Cache only if the top bit isn't =
set */
> - /* And don't enter a new vnode for an invalid one! */
> + /*
> + * Unless this vnode is marked CODA_NOCACHE, enter it into
> + * the coda name cache to avoid a future venus round-trip.
> + * XXX Interaction with componentname NOCACHE is unclear.
My thought is that having either one of them set should inhibit caching.=20
When is CODA_NOCACHE set?
> + */
> if (!(vtype & CODA_NOCACHE))
> coda_nc_enter(VTOC(dvp), nm, len, cred, VTOC(*vpp));
> }
> @@ -978,6 +996,11 @@ coda_lookup(void *v)
> cnp->cn_flags |=3D SAVENAME;
> *ap->a_vpp =3D NULL;
> }
> + /*
> + * XXX If creating and we found something, does that need to be an
> + * error?
> + */
Not necessarily. All of the CREATE/RENAME/DELETE stuff is an optimization.=
=20
The idea is that if we're creating, we have to look through the directory=
=20
to see if the name already exits, and we also have to look to see where=20
there's a free slot to insert the new name. Adding CREATE is all about=20
compacting that into one check. We keep the parent directory locked so=20
that no one can use the empty slot we found before we do.
Likewise RENAME and DELETE are optimizations for those ops. If we're about=
=20
to rename something, it makes no sense to add it to the name cache.
Also, Chuq changed how all of this works recently. I'm not 100% on the new=
=20
stuff, but he simplified it all.
> /*
> * If we are removing, and we are at the last element, and we
> @@ -997,17 +1020,25 @@ coda_lookup(void *v)
> }
> =20
> /*
> - * If the lookup went well, we need to lock the child.
> + * If the lookup succeeded, we must generally lock the returned
> + * vnode. This could be a ., .., or normal lookup. See
> + * vnodeops(9) for the details.
I really think you need to start this dance sooner.
> */
> if (!error || (error =3D=3D EJUSTRETURN)) {
> - /* The parent is locked, and may be the same as the child */
> + /* Lookup has a value and it isn't "."? */
> if (*ap->a_vpp && (*ap->a_vpp !=3D dvp)) {
> - /* Different, go ahead and lock it. */
> - vn_lock(*ap->a_vpp, LK_EXCLUSIVE|LK_RETRY);
> + if (flags & ISDOTDOT)
> + /* ..: unlock parent */
> + VOP_UNLOCK(dvp, 0);
> + /* all but .: lock child */
> + vn_lock(*ap->a_vpp, LK_EXCLUSIVE | LK_RETRY);
> + if (flags & ISDOTDOT)
> + /* ..: relock parent */
> + vn_lock(dvp, LK_EXCLUSIVE | LK_RETRY);
> }
> + /* else .: leave dvp locked */
> } else {
> - /* If the lookup failed, we need to ensure that the leaf is NULL */
> - /* Don't change any locking? */
> + /* The lookup failed, so return NULL. Leave dvp locked. */
> *ap->a_vpp =3D NULL;
> }
> return(error);
Looks better though!!
Take care,
Bill
--2oS5YaxWCcQjTEyO
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)
iD8DBQFGFE1MWz+3JHUci9cRAtadAJ9L/x0GiFL6A1cxpw1OEgJ+hRY9YgCfePtO
VWgUFu+F6e+0l356YJsrgN4=
=vW3C
-----END PGP SIGNATURE-----
--2oS5YaxWCcQjTEyO--