Subject: Re: CVS commit: src/sys
To: None <tech-kern@NetBSD.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 07/29/2007 14:17:22
On Sun, Jul 29, 2007 at 12:40:37PM +0000, Antti Kantee wrote:

> Modified Files:
> 	src/sys/kern: kern_lock.c vfs_subr.c
> 	src/sys/sys: lock.h
> 
> Log Message:
> Define a new lockmgr flag LK_RESURRECT which can be used in
> conjunction with LK_DRAIN.  This has the same effect as LK_DRAIN
> except it atomically does NOT mark the lock as drained.  This
> guarantees that when we got the lock, we were the last one currently
> waiting for the lock.
> 
> Use LK_DRAIN|LK_RESURRECT in vclean() to make sure there are no
> waiters for the lock.  This should fix behaviour theoretized to be
> caused by vfs_subr.c 1.289 which caused vclean() to run into
> completion and free the vnode before all lock-waiters had been
> processed.  Should therefore fix the "simple_lock: unitialized lock"
> problems seen recently.

I think this is a good stopgap for fixing NetBSD 4, but also that it's the
wrong way to solve the problem. The issue is that there are two threads
trying to deactivate the vnode, and eventually one goes on to reclaim. That
can be prevented by continuing to hold a reference to the vnode across the
call to deactivate it. Once it's deactivated (and we've checked it doesn't
need to be re-deactivated due to a ref in the meantime) we should go on to
lock it and reclaim it. The premise being that no-one should try to lock a
vnode without first holding a reference to it.

On the other hand, maybe it's time we looked at pushing the locking back
into the file systems and being more strict about the reference counting, so
that these kinds of issues don't continue to appear. :-). What do people
think of the idea? Some if the issues involved in doing that I know of are..

o VOP_LOCK and VOP_UNLOCK would need to go away. To make VOP calls you'd
  only need a reference to the vnode.

o We'd need a VOP_RWLOCK or VOP_RANGELOCK for concurrent reads/writes.

o The calls to VOP_LEASE look suspicious, since they are made and then
  followed by other VOP calls under the same lock. VOP_LEASE doesn't appear
  to do anything these days though.

o There are some place where VOP_ACCESS is followed by a another call into
  the file system. Those checks would need to go into the file system.

o Lookup currently holds exclusive vnode locks. It causes a lot of single
  threading and I think we could get a good speedup by avoiding that. My
  understanding is that's due to the inode/vnode being used to maintain
  state for whoever's doing the lookup. What other issues are there? Without
  understanding the code too clearly I imagine that has effects on layered 
  file systems too.

o To eliminate v_interlock's part in locking a vnode, it would be difficult
  to be sure that we have completely reclaimed vnodes from a file system -
  because we can't be absolutely sure at which point calls stop being made
  into foo_vnops and start being made into dead_vnops. That would mean:
  after a forced unmount the file system code would need to stick around
  (not be unloaded) while there are still vnodes that it spawned. I guess we
  could add a reference count to the vnode to count the number of LWPs
  currently within a VOP and poll until the figure drops to zero for all the
  remaining vnodes. That's a lot of additional atomic ops for the system to
  do, though.

Can anyone see other issues? Any other ideas?

Thanks,
Andrew