Subject: Re: vnode locking in coda_lookup
To: Greg Troxel <gdt@ir.bbn.com>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 04/04/2007 18:13:48
--2oS5YaxWCcQjTEyO
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Apr 04, 2007 at 07:38:14PM -0400, Greg Troxel wrote:
> I have attempted to fix vnode locking in coda's lookup routine.
> Basically, coda_lookup (in sys/coda/coda_vnops.c) never followed the
> special rules for IS_DOTDOT.  I've read the rules, and looked at
> ufs_lookup, but I'm not sure this is right.  It does survive=20

Simply put, any time you have locks, you have to have a locking order.=20
Since most lookups progress from root down, the locking order we have is=20
from the root vnode down. Since ".." really is a vnode higher up, you=20
can't lock ".." whie you have a lock on your directory as that's not=20
locking from the top down. So you release your lock, lock "..", then=20
re-lock ".".

>   for i in `cd /usr/bin && ls`; do cp -pf /usr/bin/$i ../$i & done
> and=20
>   for i in `cd .. && ls`; do cat ../$i  > /dev/null & done

How many concurrent processes will this fire up? If you used to die at=20
this and now don't, that's an improvement.

> I have also cleaned up and rewritten comments, some of which probably
> date from Mach days.
>=20
> I know that wrstuden@ in the past said that LK_RETRY was not correct.

I personally think it's vile for this.

> ufs_lookup does it, and I'm still unclear on this.  I do have a trusty
> Thinkpad 600E crashbox for testing.

However doing what ufs does isn't too bad; you're not making a worse mess.=
=20
:-)

> So two questions:
>=20
>   is this correct?
>=20
>   is this clearly better than what was there?
>=20
>=20
>=20
> --- coda_vnops.c.~1.52.~	2007-03-12 14:44:43.000000000 -0400
> +++ coda_vnops.c	2007-04-04 19:26:49.000000000 -0400
> @@ -885,22 +885,20 @@ coda_lookup(void *v)
>  {
>  /* true args */
>      struct vop_lookup_args *ap =3D v;
> +    /* (locked) vnode of dir in which to do lookup */
>      struct vnode *dvp =3D ap->a_dvp;
>      struct cnode *dcp =3D VTOC(dvp);
> +    /* output variable for result */
>      struct vnode **vpp =3D ap->a_vpp;
> -    /*
> -     * It looks as though ap->a_cnp->ni_cnd->cn_nameptr holds the rest
> -     * of the string to xlate, and that we must try to get at least
> -     * ap->a_cnp->ni_cnd->cn_namelen of those characters to macth.  I
> -     * could be wrong.
> -     */
> -    struct componentname  *cnp =3D ap->a_cnp;
> +    /* name to lookup */
> +    struct componentname *cnp =3D ap->a_cnp;
>      kauth_cred_t cred =3D cnp->cn_cred;
>      struct lwp *l =3D cnp->cn_lwp;
>  /* locals */
>      struct cnode *cp;
>      const char *nm =3D cnp->cn_nameptr;
>      int len =3D cnp->cn_namelen;
> +    int flags =3D cnp->cn_flags;
>      CodaFid VFid;
>      int	vtype;
>      int error =3D 0;
> @@ -910,6 +908,17 @@ coda_lookup(void *v)
>      CODADEBUG(CODA_LOOKUP, myprintf(("lookup: %s in %s\n",
>  				   nm, coda_f2s(&dcp->c_fid))););
> =20
> +    /*
> +     * XXX componentname flags in MODMASK are not handled at all */
> +     */
> +
> +    /*
> +     * The overall strategy is to switch on the lookup type and get a
> +     * result vnode that is vref'd but not locked.  Then, the code at

You NEVER grab a lock while getting the new vnode? If so, then this is ok.=
=20
If you grab an intermediate lock, you probably should do the ISDOTDOT=20
dance.

> +     * exit: switches on ., .., and regular lookups and does the right
> +     * locking.
> +     */
> +
>      /* Check for lookup of control object. */
>      if (IS_CTL_NAME(dvp, nm, len)) {
>  	*vpp =3D coda_ctlvp;
> @@ -918,6 +927,7 @@ coda_lookup(void *v)
>  	goto exit;
>      }
> =20
> +    /* Avoid trying to hand venus an unreasonably long name. */
>      if (len+1 > CODA_MAXNAMLEN) {
>  	MARK_INT_FAIL(CODA_LOOKUP_STATS);
>  	CODADEBUG(CODA_LOOKUP, myprintf(("name too long: lookup, %s (%s)\n",
> @@ -926,8 +936,13 @@ coda_lookup(void *v)
>  	error =3D EINVAL;
>  	goto exit;
>      }
> -    /* First try to look the file up in the cfs name cache */
> -    /* lock the parent vnode? */
> +
> +    /*
> +     * Try to resolve the lookup in the minicache.  If that fails, ask
> +     * venus to do the lookup.  XXX The interaction between vnode
> +     * locking and coda locking to avoid deadlocks on .. lookups is
> +     * not clear.

Pick an ordering and stick with it. It doesn't matter what it is, beyond=20
the fact some orderings are easier to work with than others.

The same ordering as VFS is probably good, though.

> +     */
>      cp =3D coda_nc_lookup(dcp, nm, len, cred);
>      if (cp) {
>  	*vpp =3D CTOV(cp);
> @@ -935,8 +950,7 @@ coda_lookup(void *v)
>  	CODADEBUG(CODA_LOOKUP,
>  		 myprintf(("lookup result %d vpp %p\n",error,*vpp));)
>      } else {
> -
> -	/* The name wasn't cached, so we need to contact Venus */
> +	/* The name wasn't cached, so ask Venus. */
>  	error =3D venus_lookup(vtomi(dvp), &dcp->c_fid, nm, len, cred, l, &VFid=
, &vtype);
> =20
>  	if (error) {
> @@ -952,9 +966,13 @@ coda_lookup(void *v)
> =20
>  	    cp =3D make_coda_node(&VFid, dvp->v_mount, vtype);
>  	    *vpp =3D CTOV(cp);
> +	    /* vpp is now vrefed. */
> =20
> -	    /* enter the new vnode in the Name Cache only if the top bit isn't =
set */
> -	    /* And don't enter a new vnode for an invalid one! */
> +	    /*
> +	     * Unless this vnode is marked CODA_NOCACHE, enter it into
> +	     * the coda name cache to avoid a future venus round-trip.
> +	     * XXX Interaction with componentname NOCACHE is unclear.

My thought is that having either one of them set should inhibit caching.=20
When is CODA_NOCACHE set?

> +	     */
>  	    if (!(vtype & CODA_NOCACHE))
>  		coda_nc_enter(VTOC(dvp), nm, len, cred, VTOC(*vpp));
>  	}
> @@ -978,6 +996,11 @@ coda_lookup(void *v)
>  	cnp->cn_flags |=3D SAVENAME;
>  	*ap->a_vpp =3D NULL;
>      }
> +    /*
> +     * XXX If creating and we found something, does that need to be an
> +     * error?
> +     */

Not necessarily. All of the CREATE/RENAME/DELETE stuff is an optimization.=
=20
The idea is that if we're creating, we have to look through the directory=
=20
to see if the name already exits, and we also have to look to see where=20
there's a free slot to insert the new name. Adding CREATE is all about=20
compacting that into one check. We keep the parent directory locked so=20
that no one can use the empty slot we found before we do.

Likewise RENAME and DELETE are optimizations for those ops. If we're about=
=20
to rename something, it makes no sense to add it to the name cache.

Also, Chuq changed how all of this works recently. I'm not 100% on the new=
=20
stuff, but he simplified it all.

>      /*
>       * If we are removing, and we are at the last element, and we
> @@ -997,17 +1020,25 @@ coda_lookup(void *v)
>      }
> =20
>      /*
> -     * If the lookup went well, we need to lock the child.
> +     * If the lookup succeeded, we must generally lock the returned
> +     * vnode.  This could be a ., .., or normal lookup.  See
> +     * vnodeops(9) for the details.

I really think you need to start this dance sooner.

>       */
>      if (!error || (error =3D=3D EJUSTRETURN)) {
> -	/* The parent is locked, and may be the same as the child */
> +	/* Lookup has a value and it isn't "."? */
>  	if (*ap->a_vpp && (*ap->a_vpp !=3D dvp)) {
> -	    /* Different, go ahead and lock it. */
> -	    vn_lock(*ap->a_vpp, LK_EXCLUSIVE|LK_RETRY);
> +	    if (flags & ISDOTDOT)
> +		/* ..: unlock parent */
> +		VOP_UNLOCK(dvp, 0);
> +	    /* all but .: lock child */
> +	    vn_lock(*ap->a_vpp, LK_EXCLUSIVE | LK_RETRY);
> +	    if (flags & ISDOTDOT)
> +		/* ..: relock parent */
> +	        vn_lock(dvp, LK_EXCLUSIVE | LK_RETRY);
>  	}
> +	/* else .: leave dvp locked */
>      } else {
> -	/* If the lookup failed, we need to ensure that the leaf is NULL */
> -	/* Don't change any locking? */
> +	/* The lookup failed, so return NULL.  Leave dvp locked. */
>  	*ap->a_vpp =3D NULL;
>      }
>      return(error);

Looks better though!!

Take care,

Bill

--2oS5YaxWCcQjTEyO
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGFE1MWz+3JHUci9cRAtadAJ9L/x0GiFL6A1cxpw1OEgJ+hRY9YgCfePtO
VWgUFu+F6e+0l356YJsrgN4=
=vW3C
-----END PGP SIGNATURE-----

--2oS5YaxWCcQjTEyO--