Subject: Re: pr/35143 and layer_node_find()
To: Chuck Silvers <chuq@chuq.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/03/2006 13:06:59
--wRRV7LY7NUeQGEoC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 01, 2006 at 09:52:49AM -0800, Chuck Silvers wrote:
> On Thu, Nov 30, 2006 at 10:10:22AM -0800, Bill Studenmund wrote:
> > > I think we need to figure that out before we decide on a fix.
> >=20
> > Ok, here's an idea on how it can happen on our current kernel.
> >=20
> > We have one process holding the lock on the lower vnode, upper vnode is=
=20
> > unreferenced.
> >=20
> > Then another process comes into the layered file system, does a lookup =
on=20
> > vnode, and blocks in VOP_LOOKUP() on the lower layer waiting for the lo=
ck.
> >=20
> > Then a thrid process comes in and decides to recycle a vnode. It gets t=
he=20
> > layer vnode, sets VXLOCK, then goes to sleep waiting to get the stack's=
=20
> > vnode lock.
>=20
> wouldn't getcleanvnode() skip over this locked VLAYER vnode?

Hmm... It's supposed to.

Please see Darrin's analysis in the PR. If you can see something that's=20
not right in it, please comment!

Even if you and I can't figure out how it happened, we have a report of an
observed situation where layer_node_find() got stuck because it had the
vnode stack's lock and VXLOCK was set on the vnode, thus the vget() call
deadlocked.

Note also that if we fix this issue, the test in getnewvnode() can go=20
away. While it would be better to try an unlocked vnode, we would still be=
=20
able to have things work with a locked one.

> > We have to have the vget() not wait if it sees VXLOCK.
> >=20
> > I still don't see what's wrong with letting the being-destroyed nodes s=
tay=20
> > in the hash table. For them to have VXLOCK set, there has to be a threa=
d=20
> > reclaiming them, so they will be removed from the hash list in due time.
>=20
> I don't know that it would cause any particular problem right now,
> it just doesn't seem like a good idea to allow multiple vnodes with
> the same identity to exist at the same time, even if all but one of
> them are in the process of being reclaimed.

One way to look at it is that the moment VXLOCK gets set, the vnode no=20
longer is the upper one for the given lower vnode.

I guess part of why I'm comfortable with this is that I've doen it before.
I've worked on an iSCSI target, and you can run into issues like this
(where something still exists since it hasn't been cleaned up, but its
visibility has been removed). Specifically an iSCSI task can come in,
acknowledge a previous command, and simultaneously start a new command
with the same tag.

I'd appreciate the suggestion of another way to fix, but I can't think of=
=20
anything that'd help w/o doing somehting that can make a mess elsewhere.

The one thing I think I might like would be to add a vnode cleaning thread=
=20
that tries to keep X vnodes clean-and-free.

Take care,

Bill

--wRRV7LY7NUeQGEoC
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD4DBQFFczxzWz+3JHUci9cRAttwAJYqW+rMjncWzgZckdtReOqbYKBKAJ4jsASv
3m/vg5ebeD4DAucq9EBktQ==
=05dD
-----END PGP SIGNATURE-----

--wRRV7LY7NUeQGEoC--