Subject: change to vfs_busy() and getnewvnode()
To: None <tech-kern@netbsd.org>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: tech-kern
Date: 07/04/1999 13:27:03
So, I just fixed a long-standing bug where umount -f could cause a
"dangling vnode" panic.

I think the change is pretty well documented by the change comment,
but I'm sending a heads-up here just so people know..

    Fix kern/7906: race between unmount and getnewvnode()

    mp->mnt_flags & MNT_MWAIT is replaced by mp->mnt_wcnt, and a new mount
    flag MNT_GONE is created (reusing the same bit).

    In insmntque(), add DIAGNOSTIC check to fail if the filesystem vnode
    is being moved to is in the process of being unmounted.

    getnewvnode() now protects the list of vnodes active on mp with
    vfs_busy()/vfs_unbusy().

    To avoid generating spurious errors during a doomed unmount, change
    the "wait for unmount to finish" protocol between dounmount() and
    vfs_busy().  In vfs_busy(), instead of only sleeping once, sleep until
    either MNT_UNMOUNT is clear or MNT_GONE is set; also, maintain a count
    of waiters in mp->mnt_wcnt so that dounmount() knows when it's safe to
    free mp.

    tested by running a "while :; do mount /d1; umount -f /d1; done" loop
    against multiple find(1) processes.

There are two notable behavior changes as a result of this:
	- vfs_busy() can sleep, and then return success; previously,
it would only sleep when it was going to return failure anyway.  A
quick audit of the places where vfs_busy() is called didn't show any
obvious problems as a result of this.

	- getnewvnode() will now sleep if an unmount is going on.
Given that it already can sleep (either waiting for memory, or in
vgonel(), this shouldn't cause trouble.

One bit of (not new) ugliness I noticed is that getnewvnode() is
called from {ffs,ext2fs,ufs}_vget() with ufs_hashlock held, which
single-threads all vnode allocations.  Given how much work may need to
be done in vinvalbuf() and vgonel() (vinvalbuf() does a VOP_FSYNC())
this precludes a bunch of potential I/O parallelism when the system's
dirtying a lot of vnodes (among other things, this probably also does
evil things to LFS).

						- Bill