tech-kern: Re: CVS commit: src/sys

Subject: Re: CVS commit: src/sys
To: Andrew Doran <ad@netbsd.org>
From: Antti Kantee <pooka@cs.hut.fi>
List: tech-kern
Date: 07/29/2007 20:42:12
On Sun Jul 29 2007 at 14:17:22 +0100, Andrew Doran wrote:
> > Define a new lockmgr flag LK_RESURRECT which can be used in
> > conjunction with LK_DRAIN.  This has the same effect as LK_DRAIN
> > except it atomically does NOT mark the lock as drained.  This
> > guarantees that when we got the lock, we were the last one currently
> > waiting for the lock.
> > 
> > Use LK_DRAIN|LK_RESURRECT in vclean() to make sure there are no
> > waiters for the lock.  This should fix behaviour theoretized to be
> > caused by vfs_subr.c 1.289 which caused vclean() to run into
> > completion and free the vnode before all lock-waiters had been
> > processed.  Should therefore fix the "simple_lock: unitialized lock"
> > problems seen recently.
> 
> I think this is a good stopgap for fixing NetBSD 4, but also that it's the
> wrong way to solve the problem. The issue is that there are two threads
> trying to deactivate the vnode, and eventually one goes on to reclaim. That
> can be prevented by continuing to hold a reference to the vnode across the
> call to deactivate it. Once it's deactivated (and we've checked it doesn't
> need to be re-deactivated due to a ref in the meantime) we should go on to
> lock it and reclaim it. The premise being that no-one should try to lock a
> vnode without first holding a reference to it.

The issue this tries to solve is layerfs using drained locks.  Originally
there just was no way to tell lockmgr what we tell it now, so changing
LK_DRAIN to LK_EXCLUSIVE was used as a best-effort approximation.
It failed.

You are fixing a different problem.  It should be fixed independently of
this issue.  They just happened to be intertwined.  But that can happen
$whenever - at least I'm not aware of any more problems with the current
rele-then-lock weirdosity.

> o We'd need a VOP_RWLOCK or VOP_RANGELOCK for concurrent reads/writes.

Is a rangelock really helpful?

> o There are some place where VOP_ACCESS is followed by a another call into
>   the file system. Those checks would need to go into the file system.

Power to the file system.  Also what should be done is define the role
of checks between vfs and the file system better.  My vote is to just
let the file system decide instead of trying to cram random bits and
pieces into the vnode.  Make the vnode less fat!

> o Lookup currently holds exclusive vnode locks. It causes a lot of single
>   threading and I think we could get a good speedup by avoiding that. My
>   understanding is that's due to the inode/vnode being used to maintain
>   state for whoever's doing the lookup. What other issues are there? Without
>   understanding the code too clearly I imagine that has effects on layered 
>   file systems too.

I *think* the state just acts as a cache, e.g. if you're doing a create
lookup you don't need to rescan to find an empty slot.

As I've understood it, the reason for the lock, lock, unlock, lock etc.
namei() is mostly historical ("it's always been done that way").
But this understanding is mostly based on discussions with other people.

> o To eliminate v_interlock's part in locking a vnode, it would be difficult
>   to be sure that we have completely reclaimed vnodes from a file system -
>   because we can't be absolutely sure at which point calls stop being made
>   into foo_vnops and start being made into dead_vnops. That would mean:
>   after a forced unmount the file system code would need to stick around
>   (not be unloaded) while there are still vnodes that it spawned. I guess we
>   could add a reference count to the vnode to count the number of LWPs
>   currently within a VOP and poll until the figure drops to zero for all the
>   remaining vnodes. That's a lot of additional atomic ops for the system to
>   do, though.

I don't think even currently reclaiming a vnode doesn't mean we can't
have references to the inode or the mount point (inside the fs code)
and the fact that unmount -f happens to work is mostly just luck.

-- 
Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
http://www.iki.fi/pooka/                          http://www.NetBSD.org/
    "la qualité la plus indispensable du cuisinier est l'exactitude"