Subject: LK_DRAIN vs the interlock vs VOP_INACTIVE, was Re: reboot problems unmounting root
To: Antti Kantee <pooka@cs.hut.fi>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/06/2007 13:53:41
--fz0LNKsoEivY4NpG
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 05, 2007 at 01:14:54PM -0700, Bill Stouder-Studenmund wrote:
> > I'd prefer another approach where the whole layering lock export system
> > was redone instead of redoing reclaim to fix the layer lock exporting.
> > But I don't know how to fix layerfs locking ...
>=20
> Do we care about the inerlock? The advantage of the current locking=20
> behavior is that, since each layer has access to the lockmgr lock, we can=
=20
> pass in the right interlock. If we either don't care, or we're fine with =
a=20
> lockmgr call on vnode D releasing the interlock on vnode A (because=20
> someone tried to lock A with its interlock held, and A is above B, which=
=20
> is above C, which is above D), then we can just turn it all into recursiv=
e=20
> VOP_LOCK() calls on the underlying vnodes.

Ok, this is the crux of the issue.

We export a vnode lock pointer from a vnode for a number of reasons.  One
key one is that LK_INTERLOCK, in terms of VOP_LOCK(), specifically means
we hold the interlock on the vnode on which we're calling VOP_LOCK(). By
exporting a lock pointer, upper layers can just call lockmgr directly with
the right interlock. We also have the advantage that we save a function
call per vnode in the stack, and all the I-cache & D-cache impact that
that entails.

Right now, the only fs that doesn't export a lock is unionfs, mainly as=20
it has to lock two underlying vnodes.

vclean() locks the vnode with LK_DRAIN, then either unlocks the node=20
directly or lets VOP_INACTIVE() do it. After that point, calling the lock=
=20
manager on the lock triggers the panic we're seeing.

The difficulty is that upper layers don't know when the lower layer's been=
=20
vclean()ed, so they keep calling the lock manager directly. Boom!

So there are a number of things we can do.

1) Stop exporting the lock, and force recursion across the board. This=20
will have performance impacts, and botch the interlock.

2) Stop exporting the lock, force recursion, but recurse the interlock.=20
Lock the lowervp interlock then release the upper one, then hand the lock=
=20
call off.

3) Add LK_REENABLE to the unlock of the lock in vclean(). I like this idea=
=20
the best, as it means the least change; a change pulled into netbsd-4 and=
=20
netbsd-3 will leave the kernel working mostly as-is. It is a feature of=20
the lock manager that lets an LK_DRAINed lock come back to life.

The issue here is that LK_REENABLE only works when it is passed during the=
=20
unlock that releases the LK_DRAIN. If the vnode is in use, the very case=20
we care about, this unlock will happen in the call to VOP_INACTIVE().=20
So...

3a) We change all the VOP_INACTIVE routines to add LK_REENABLE to their=20
unlock calls. This change will have no impact for anything other than=20
LK_DRAIN, as it's the only thing that looks at this flag. It however means=
=20
changing EVERY inactive routine to handle something that happens rarely.

3b) Call lockmgr on the lock (or more accurately VOP_LOCK w/ flags) to do=
=20
the same thing as LK_REENABLE. Just passing it in would be fine, the=20
problem is we don't have a "do nothing" lock operation. So we'd also need=
=20
to add one.

3c) Have vclean() grab a recursive vnode lock, then LK_RELEASE |
LK_REENABLE it. The problem is that anything other than LK_RELEASE as the
lock operation will trigger a panic.

So what do folks think of 3b?

Take care,

Bill

--fz0LNKsoEivY4NpG
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGjqvVWz+3JHUci9cRAqZJAJ9PAGcKm6KqbVH7Kh+eHUxogj79LgCgkEg2
S94cxJoO8RZpscBmi2sum3M=
=qtl+
-----END PGP SIGNATURE-----

--fz0LNKsoEivY4NpG--