Subject: Re: PR 32535
To: SODA Noriyuki <>
From: Bill Studenmund <>
List: tech-kern
Date: 10/24/2006 11:13:49
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 24, 2006 at 07:06:08PM +0900, SODA Noriyuki wrote:
> >>>>> On Mon, 23 Oct 2006 20:29:12 -0700,
>       Bill Studenmund <> said:
> > I have a proposed fix for PR 32535, and I'd like other folks to look it=
> > over.
> This is other PR about vnlock with nullfs in kern/32409.
> And it seems this patch didn't fix the that case at least.
> The vnlock problem happend on the machine with this fix.


Unfortunately I can't tell much from the PR so far.

It looks like something's going wrong with a vnode while the vnode lock is=
held. Then directory lookup & such piles up on the locks, and we race for=
root. My guess is that 0x70e91e50, the vnode most processes are piled up=20
on, is the root vnode.

How easy is this to reproduce? Is this what's taking ftp.n.o down?

I wish we had gdb. I have a script that is supposed to walk a vnode chain.=
So you could point it at that vnode, it would find the process owning it,=
and see what it's sleeping on. And so on until we find a vnode owned by a=
process not sleeping on a vnode. _That_ is the source of the problem.

The other option I see is we can extend ddb's print routines so that it=20
will print the process that holds the lock on a vnode. Then you can dump a=
vnode, see a proc, look at that proc's wait channel, and itterate.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.4.3 (NetBSD)