Subject: Re: PR 32535
To: SODA Noriyuki <soda@sra.co.jp>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 10/24/2006 11:13:49
--pf9I7BMVVzbSWLtt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Oct 24, 2006 at 07:06:08PM +0900, SODA Noriyuki wrote:
> >>>>> On Mon, 23 Oct 2006 20:29:12 -0700,
> Bill Studenmund <wrstuden@NetBSD.org> said:
>=20
> > I have a proposed fix for PR 32535, and I'd like other folks to look it=
=20
> > over.
>=20
> This is other PR about vnlock with nullfs in kern/32409.
> And it seems this patch didn't fix the that case at least.
> The vnlock problem happend on the machine with this fix.
Ok.
Unfortunately I can't tell much from the PR so far.
It looks like something's going wrong with a vnode while the vnode lock is=
=20
held. Then directory lookup & such piles up on the locks, and we race for=
=20
root. My guess is that 0x70e91e50, the vnode most processes are piled up=20
on, is the root vnode.
How easy is this to reproduce? Is this what's taking ftp.n.o down?
I wish we had gdb. I have a script that is supposed to walk a vnode chain.=
=20
So you could point it at that vnode, it would find the process owning it,=
=20
and see what it's sleeping on. And so on until we find a vnode owned by a=
=20
process not sleeping on a vnode. _That_ is the source of the problem.
The other option I see is we can extend ddb's print routines so that it=20
will print the process that holds the lock on a vnode. Then you can dump a=
=20
vnode, see a proc, look at that proc's wait channel, and itterate.
Take care,
Bill
--pf9I7BMVVzbSWLtt
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)
iD8DBQFFPlfdWz+3JHUci9cRAvuFAKCUAEXM8Mn37axwOgg4eii38KPW6gCfZRpY
gnXVY04GN8+SseN7/up206s=
=QU7v
-----END PGP SIGNATURE-----
--pf9I7BMVVzbSWLtt--