Subject: Re: pr/35143 and layer_node_find()
To: Chuck Silvers <chuq@chuq.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 11/29/2006 11:20:19
--LQksG6bCIzRHxTLp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Nov 29, 2006 at 09:16:50AM -0800, Chuck Silvers wrote:
> On Tue, Nov 28, 2006 at 12:47:32PM -0800, Bill Studenmund wrote:
> > The patch does two things. The main thing it does is change the vget() =
in=20
> > layer_node_find() to a) not perform a lockmgr() op (I didn't know you=
=20
> > could do this, but it's exactly what I want) and b) set LK_NOWAIT. This=
=20
> > way, we either pull the vnode off the free list or we get an error=20
> > indicating that the node's being recycled.
> >=20
> > The other thing the patch does is add layer_node_find1(), which now
> > contains all the code. The caller must hold lmp->layerm_hashlock
> > explicitly. The main reason for this is so layer_node_alloc() can hold =
it
> > the whole time. This way we close a window in which, if the locking
> > protocol were ever changed, two different upper nodes could be added (in
> > the same layer) above one given lower node. Since we want to change the=
=20
> > locking protocol in the future, I want to make this change. Also, it ma=
kes=20
> > some of the cleaning case issues clear in my head. :-)
>=20
> your patch allows multiple instances of the same vnode in the hash
> if all but at most one of them is being recycled.  I don't think
> that's a very good idea, I think we should maintain the usual rule
> that there cannot be duplicates.  to fix the MP locking in this case
> we just need to make sure that if layer_node_find() returns NULL then
> we hold the hashlock from the time we started looking through the hash
> bucket until after we add the new node to the hash bucket.  if
> layer_node_find() returns a valid vp, then it's ok that we dropped the
> hashlock while calling vget() on it.

But we can't return a valid vp in this case. To get into this corner case,=
=20
we have to have another thread in the kernel actively in the process of=20
cleaning said vnode. To be honest, I'm not 100% sure how a big-lock kernel=
=20
got into this case, but it did...

The point is that as soon as we release the lock on the vnode stack, the
vnode that is currently in the list will change file system types; it will
get reclaimed and issued to some other mount that wants a vnode. So if we
return it, something can be looking up a path in /null/ftp/pub and get an
ffs vnode for /var/log/messages. That's bad. REALLY bad.

VXLOCK has been set on the old upper vnode. We can't add references for it=
=20
anymore! Any VOPs in progress have to error out.

So we either have to do what I'm doing (pretend the old upper vnode isn't=
=20
there anymore), or we have to error out the VOP_LOOKUP() with some sort of=
=20
ETRYAGAIN. And we have to teach lookup() to retry in case of said error=20
return.

Trying again strikes me as kinda lame. The lookup succeeded! But if that's=
=20
what we have to do, we can. It would mean, though, that we have to release=
=20
(somehow or other) the vnode stack lock, so that the existing node can get=
=20
reclaimed.

Take care,

Bill

--LQksG6bCIzRHxTLp
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFFbd1zWz+3JHUci9cRAgVdAJ931XynsQoN97buuVYXa5/CqwXWuQCdFFRg
GHE6g+/XTMy1+f4b3OX2DFI=
=SVtn
-----END PGP SIGNATURE-----

--LQksG6bCIzRHxTLp--