Subject: Re: pr/35143 and layer_node_find()
To: Chuck Silvers <chuq@chuq.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/06/2006 21:59:32
--Bn2rw/3z4jIqBvZU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Dec 06, 2006 at 09:01:33PM -0800, Chuck Silvers wrote:
> On Sun, Dec 03, 2006 at 01:06:59PM -0800, Bill Studenmund wrote:
> > > wouldn't getcleanvnode() skip over this locked VLAYER vnode?
> >=20
> > Hmm... It's supposed to.
> >=20
> > Please see Darrin's analysis in the PR. If you can see something that's=
=20
> > not right in it, please comment!
>=20
> I've figured out what's going wrong with the VOP_ISLOCKED() call.
> lockstatus() can return 0 incorrectly if the lk_flags contains LK_WANT_EX=
CL
> or LK_WANT_UPGRADE.  so if we have this sequence of events then we'll
> hit the deadlock:
>=20
> thread 1 locks the lower vnode with LK_SHARED and sleeps.
> thread 2 sets LK_WANT_EXCL on the vnode lock and sleeps.
> thread 1 wakes up and releases the vnode lock.
> thread 3 tries to recycle the upper vp, set XLOCK but sleeps on VOP_LOCK()
>          because LK_WANT_EXCL is set
> thread 2 wakes up, clears LK_WANT_EXCL, finishes acquiring the lock,
>          and then calls vget() on the upper vnode and sleeps on XLOCK.
>=20
> now threads 2 and 3 are deadlocked.
>=20
> the attached patch fixes this problem, and with both this patch and
> my other patch for the vrele() issue applied, a stress test survives
> much longer than it previously took to trigger a hang.

What eventually kills it?

> any objection to going with this approach?

We DEFINITELY need this patch. How exactly did you figure this out? :-)

I still am uncomfortable with us calling vget() without LK_NOWAIT as it=20
seems like a deadlock waiting to happen, protected by the race in the=20
getnewvnode() test. Also, if we had a better way to recover from the=20
reaping case, we wouldn't need the test in getnewvnode(). :-)

However I agree totally with your patch on everything except this one=20
point, so it's probably better to get this into 4.0 and keep thinking than=
=20
to not.

You earlier sounded like you didn't like my thought (due to the=20
multi-instance aspect) but that you didn't object to it. Would you mind me=
=20
making that change after you check all this in (and after I get your=20
stress tool & run it)?

Take care,

Bill

--Bn2rw/3z4jIqBvZU
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFFd63EWz+3JHUci9cRAli5AJ4+t1mFDc6HhJiFNXAVyUY2rE2oZgCeOqZK
0QgJFH8KTsnD2ucoXJ+AQS4=
=6mKX
-----END PGP SIGNATURE-----

--Bn2rw/3z4jIqBvZU--