Subject: Re: mount_null: /mnt (/mnt) and /mnt are not distinct paths
To: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 07/06/1999 13:35:07
On Tue, 6 Jul 1999, Bill Sommerfeld wrote:

> > Wouldn't this also deadlock if you, e.g., 
> > 
> > 	mount -t null /foo /mnt
> > 	mount -t null /foo /mnt/xxx/yyy
> 
> Yes, this is another screw case.

And one we need to somehow prevent. This shared lock discussion won't
remove the danger of deadlock present in the above case. consider name
lookups with LOCKPARENT set on both /mnt/xxx and /mnt/xxx/yyy. :-)

> > I don't see a good way to avoid this problem, since the fact that nullfs
> > can clone parts of the tree somewhere else effectively invalidates the
> > assumption that the filesystem has a tree structure, which is what you're
> > relying on by ordering the vnodes as stated.
> 
> What I've been thinking about is the use of shared locks for vnode
> lookup.

The devil's advocate in me points out that shared locks won't cure the
problems we discuss, just make them much less likely to happen. :-) We
really need to figure out a way to prevent mounts like the above, but I'm
not sure how.

> The problem with this with the current lockmgr is that:
> 
> 	a) you need to go make anything that can be hit during a
> VOP_LOOKUP "safe" with multiple threads touching the vnode.  At a
> minimum, This is going to be most of the read-only side of the
> filesystem.

?? How is this a lock manager problem?

> 	b) because of how VOP_LOOKUP and directory-modifying ops work,
> you need some way to cleanly and atomically update a shared lock to an
> exclusive lock.  (LK_UPGRADE and LK_EXCLUPGRADE are rather messy that
> way; I'd lke to add an LK_INTENDUPGRADE which would:
> 	- block until nobody else either:
> 		1) holds an exclusive lock
> 		2) holds a SHARED | INTENDUPGRADE lock.
> 
> At that point, you can guarantee that the process holding the
> SHARED|INTENDUPGRADE lock will succeed if it tries an LK_EXCKUPGRADE,
> and don't have to worry about someone getting in sideways (which can
> happen with an LK_UPGRADE).

Why do we need this? I understand how we can loose the lock in an
LK_EXCKUPGRADE. But when we do get the lock, we'll either lock the same
vnode we had (in which case it doesn't matter that we lost the lock), or
we'll get a deadfs vnode, in which case we've lost anyway. :-)

> Given those prerequisites, I think the locking protocol for lookups
> should look something like:
> 
> 	- during a namei LOOKUP operation, namei uses shared locks for
> everything except the final vnode, where it takes a
> SHARED|INTENDUPGRADE; caller gets to upgrade the lock when appropriate.
> 
> 	- during CREATE/DELETE/RENAME, the same sort of dance is done
> to the final directory (which is the one being edited..).
> 
> Comments?

Why do we need SHARED|INTENDUPGRADE, if lookup() or namei() will do the
lock upgrade? Why not just get an exclusive lock in the beginning? It adds
more work for no noticable (to me) gain. If we're going to have namei()'s
caller do the upgrade, then we're radically changing the interface and
will need to audit much more code. I think that'd be bad. I think it's
much better to make all of this happen inside lookup().

I think what'd be much cleaner is to have lookup tell VOP_LOOKUP to return
the node exlock'd or shlock'd, and to tell it if the parrent is
exclusively or shared locked.

(asside: this might fix the adosfs problem of loking the parent directory,
if I understand it right.  If adosfs_lookup() get passed the flags
indicating the parrent's locked, it doesn't need to lock it. Otherwise it
just needs a shared lock.)

Here's my general idea of how it'd work (note all line #'s relative to
rev. 1.30): 

The vn_lock in the entry of lookup() on line 299 gets a shared lock.

(The "all slashes" case near line 330 might need a change - not sure)

Just before unionlookup:, if we are on the last component (ISLASTCN set) 
and the parrent is wanted locked, we LK_EXCLUPGRADE our lock. If we loose
the lock (because someone else is upgrading to exclusive) we try again. If
we fail (i.e. a vgone or vclean happened) we bail. 

We set flags to indicate if the parrent lock is exclusive.

If we're on the last component and LOCKLEAF is set, we request an
exclusivly locked vnode, otherwise we request a shared-locked vnode.

We then itterate until we've killed the whole path.

I think this flow would give all the advantages you were looking for
without needing to change the lock manager and keep all of the shared
locks within lookup().

Thoughts?

Take care,

Bill