Subject: Locking errors
To: None <tech-kern@netbsd.org>
From: Dr. Bill Studenmund <wrstuden@loki.stanford.edu>
List: tech-kern
Date: 01/29/1999 18:52:57
I'm adding some features to ffs, and in the process have found a locking
protocol error. Well two. I'm not sure what to do with it, so I'm emailing
here. :-)

According to /sys/kern/vnode_if.src, which is AFAICT the definitive
documentation on the locking protocol (ack!), vop_close is supposed to be
called with an unlocked vnode. So says the vop_close definition (around
line 101).

vclean, though, breaks that spec. vclean is in /sys/kern/vfs_subr.c. On
line 1127, it calls VOP_LOCK(vp, LK_DRAIN | LK_INTERLOCK) which will call
any other processes waiting for this lock, then give it to us. So this
process now has an exclusive lock. Then on line 1142, we call
VOP_CLOSE(vp, FNONBLOCK, NOCRED, NULL) to try and close the file.

But we're supposed to pass VOP_CLOSE an UNlocked vnode!

This problem only showed up when I added an ffs_close routine and put in a
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY)/VOP_UNLOCK(vp, 0) pair. The kernel
then just sat there in vn_lock doing nothing.

Hmm. There is another problem with vclean. It has set VXLOCK in the vnode
flags and then proceeds to try to do things with it. As Jason has
explained VXLOCK to me, it means "This vnode is dying. Go away." That
doesn't sound like the kind of vnode we should be feeding to VOP_CLOSE!
Also, since VOP_RECLAIM wants an unlocked vnode, I'm not sure we should
feed it the VXLOCK'd vnode, but that might be ok if we document
VOP_RECLAIM to deal.

How does this pseudocode revamped vclean look:

a) Define a VXLOCK1 flag which keeps vclean from being called recursivly.

b) Make VOP_INACTIVE not unlock the vnode of its own accord.

vclean(vp)
{
Panic if VXLOCK1 set

Set VXLOCK & VXLOCK1

VOP_LOCK(vp, LK_DRAIN | LK_INTERLOCK | LK_CANRECURSE)
/* The drain above wakes up all of the processes wanting this lock. They
see VXLOCK set and thus either return errors or sleep intending to return
an error on awake */

Clear VXLOCK  so that commands below function fine

Do all of the clean up that vclean does now

set the vnode's vnops to the deadfs ops

clear VXLOCK1, unlock the node, and awaken sleepers.



thoughts?

Take care,

Bill