Subject: Re: mount_null: /mnt (/mnt) and /mnt are not distinct paths
To: Konrad Schroder <perseant@hitl.washington.edu>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: tech-kern
Date: 07/06/1999 00:52:03
> Wouldn't this also deadlock if you, e.g., 
> 
> 	mount -t null /foo /mnt
> 	mount -t null /foo /mnt/xxx/yyy

Yes, this is another screw case.

> I don't see a good way to avoid this problem, since the fact that nullfs
> can clone parts of the tree somewhere else effectively invalidates the
> assumption that the filesystem has a tree structure, which is what you're
> relying on by ordering the vnodes as stated.

What I've been thinking about is the use of shared locks for vnode
lookup.

The problem with this with the current lockmgr is that:

	a) you need to go make anything that can be hit during a
VOP_LOOKUP "safe" with multiple threads touching the vnode.  At a
minimum, This is going to be most of the read-only side of the
filesystem.

	b) because of how VOP_LOOKUP and directory-modifying ops work,
you need some way to cleanly and atomically update a shared lock to an
exclusive lock.  (LK_UPGRADE and LK_EXCLUPGRADE are rather messy that
way; I'd lke to add an LK_INTENDUPGRADE which would:
	- block until nobody else either:
		1) holds an exclusive lock
		2) holds a SHARED | INTENDUPGRADE lock.

At that point, you can guarantee that the process holding the
SHARED|INTENDUPGRADE lock will succeed if it tries an LK_EXCKUPGRADE,
and don't have to worry about someone getting in sideways (which can
happen with an LK_UPGRADE).

Given those prerequisites, I think the locking protocol for lookups
should look something like:

	- during a namei LOOKUP operation, namei uses shared locks for
everything except the final vnode, where it takes a
SHARED|INTENDUPGRADE; caller gets to upgrade the lock when appropriate.

	- during CREATE/DELETE/RENAME, the same sort of dance is done
to the final directory (which is the one being edited..).

Comments?

					- Bill