Subject: Re: panic: lockmgr: release of unlocked lock
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 02/07/2005 09:49:41
--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Feb 05, 2005 at 03:49:59PM +0100, Manuel Bouyer wrote:
> On Tue, Jan 25, 2005 at 04:09:52PM -0800, Bill Studenmund wrote:
> > [...]
> > Maybe. If it is, the following patch should fix things. I printed out=
=20
> > lookup() and stared at all the error cases. This fix is the only one I=
=20
> > found we need, and we need it strongly. :-)
> >=20
> > The only other questionable case I found should be fine with the correc=
t=20
> > input flags. The case was that the goto terminal just above dirloop: co=
uld=20
> > get to bad2 with ni_dvp unset. However we would have to be on a RO fs a=
nd=20
> > (more importantly) be performing a DELETE or RENAME op with both=20
> > LOCKPARENT and WANTPARENT unset. The latter shouldn't happen.
> >=20
> > As an aside, it might be interesting to put a diagnostic printf() in th=
e=20
> > new conditional. If the printf() fires, we know we found the issue.

I assume that was the PDIRUNLOCK below?

> Hi,
> after about a week of compiles, I got this on the console:
> Feb  4 03:21:24 folk /netbsd: vnode: table is full - increase kern.maxvno=
des or NVNODE
> Feb  4 03:21:24 folk last message repeated 135 times
> Feb  5 07:27:11 lookup(): PDIRUNLOCK
> folk /netbsd: vnode: table is full - increase kern.maxvnodes or NVNODE
> Feb  5 07:27:12 folk last message repeated 2 times
>=20
> The box didn't panic, but I can't log in. Lots of processes blocked on
> vnlock. I have null mounts on this box (for the bulk build), with nullfs
> loaded via LKM, maybe this is the reason. I've got a core dump from ddb.

I don't think LKM is the issue, nor is nullfs directly the issue. I think=
=20
the problem now is that you've hit a lingering low-resources issue. Either=
=20
NetBSD is somehow mis-handling the low-resources case (something we know=20
we aren't good at), or there's another bug in error-case handling that we=
=20
haven't found.

All I can think to do is further discuss this on tech-kern.

Oh, I think all the processes in vnlock aren't the core problem. Something=
=20
else has a vnlock and is waiting for some other resource that isn't there.=
=20
Then everything else piles up on the different vnlocks.

Take care,

Bill

--T4sUOijqQbZv57TR
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFCB6o1Wz+3JHUci9cRAk5aAJ0RT69PtRrWJQpCalVFtT304kn9WQCeOWVu
8gbfWCbbItnXpYDQMCrkuqM=
=V35Y
-----END PGP SIGNATURE-----

--T4sUOijqQbZv57TR--