Subject: Re: Read-write vnode locks
To: Bill Sommerfeld <>
From: Bill Studenmund <>
List: tech-kern
Date: 09/13/1999 12:17:23
First off, sound slike a good idea. A radial change, but good.

I also think there are a few points which need reworking, and I think they
are similar to what Bill discusses below.

My concern is with the idea that all VOP_ calls be made with read-locked
(LK_SHARED) vnodes. I think for some operations, it makes more sense to
pass down a write-locked (LK_EXCLUSIVE) vnodes. The two which jump to my
mind are VOP_READ and VOP_WRITE.

For instance, at the moment, vn_read() does a vn_lock(), VOP_READ(),
VOP_UNLOCK(). If we had to pass in shared locks, we'd first have to
shared-lock the node in vn_read, and then the fs'd have to upgrade the
lock to an exclusive one. Then it'd have to downgrade the lock on the way
out (since you should return the node in the same lock state), then we
unlock it. Seems to me it's easier to just pass down an LK_EXCLUSIVE lock.

Also, if we let some operations take write-locked vnodes, I think we'd
address Bill's concern below - we could centralize all of the lock
upgrading in the higher-level routines. That way all the fs's get a node
locked appropriate for what they're doing.

On Sat, 11 Sep 1999, Bill Sommerfeld wrote:

> There are a couple things which will need to be reworked which you
> didn't mention in your message, mostly related to read-to-write
> upgrade issues.
> Our current lockmgr provides two ways to upgrade a shared lock to an
> exclusive lock.  Both are .. problematic.
> LK_UPGRADE is guaranteed to work, but involves releasing the shared
> lock before reacquiring a exclusive lock; this means that other
> processes may get in with an exclusive lock (and change things)
> during the upgrade, which means that you can't make many assumptions
> based on values you looked at while holding the read lock.
> LK_EXCLUPGRADE does not allow another process to get in sideways, but
> can fail (if another process is already waiting for an EXCLUPGRADE on
> the same lock), which means you need some recovery code which handles
> this case.
> For instance, ufs_lookup of a nonexistant name leaves a few
> breadcrumbs around in the directory inode and the cnp structure to
> tell a subsequent ufs_direnter() call where to put the name;
> currently, it can get away with this because exclusive locks are used;
> this would need to be reworked.  (look at the EJUSTRETURN return path
> in ufs_lookup).
> Each of the directory ops (VOP_*) would thus need to upgrade the lock,
> and then revalidate the directory entries and possibly fail.  some
> sort of common routine for this revalidation would make sense; your
> design would make this per-filesystem and put it under the VOP layer..

How about this: we kinda keep LOCKPARENT & LOCKLEAF, but only the very end
of namei() would know/care about them. We do all the lookups with shared
locks, as Charles suggested, and then at the very end, if this was the
last component, we upgrade the locks to exclusive as per LOCK*.

If we use LK_UPGRADE, we could include the id checking. If the id number
changes, namei() could re-execute the last component lookup.

> BTW, count me in as part of the set of people willing to work on
> making this change..  

Me too. :-)

Take care,