Subject: Re: PR 32535
To: SODA Noriyuki <firstname.lastname@example.org>
From: Bill Studenmund <email@example.com>
Date: 10/24/2006 11:13:49
Content-Type: text/plain; charset=us-ascii
On Tue, Oct 24, 2006 at 07:06:08PM +0900, SODA Noriyuki wrote:
> >>>>> On Mon, 23 Oct 2006 20:29:12 -0700,
> Bill Studenmund <wrstuden@NetBSD.org> said:
> > I have a proposed fix for PR 32535, and I'd like other folks to look it=
> > over.
> This is other PR about vnlock with nullfs in kern/32409.
> And it seems this patch didn't fix the that case at least.
> The vnlock problem happend on the machine with this fix.
Unfortunately I can't tell much from the PR so far.
It looks like something's going wrong with a vnode while the vnode lock is=
held. Then directory lookup & such piles up on the locks, and we race for=
root. My guess is that 0x70e91e50, the vnode most processes are piled up=20
on, is the root vnode.
How easy is this to reproduce? Is this what's taking ftp.n.o down?
I wish we had gdb. I have a script that is supposed to walk a vnode chain.=
So you could point it at that vnode, it would find the process owning it,=
and see what it's sleeping on. And so on until we find a vnode owned by a=
process not sleeping on a vnode. _That_ is the source of the problem.
The other option I see is we can extend ddb's print routines so that it=20
will print the process that holds the lock on a vnode. Then you can dump a=
vnode, see a proc, look at that proc's wait channel, and itterate.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)
-----END PGP SIGNATURE-----