Subject: Re: kern/31501: reproducible panics with 3.99.9/i386
To: None <gnats-bugs@netbsd.org>
From: Quentin Garnier <cube@cubidou.net>
List: tech-kern
Date: 10/07/2005 13:41:50
--PVZB3C997XEKXh6G
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Fri, Oct 07, 2005 at 11:05:01AM +0000, dive@endersgame.net wrote:
> >Number: 31501
> >Category: kern
> >Synopsis: NetBSD 3.99.9 panics in a repeatable manner on i386
[...]
> fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
> uvm_fault(0xc0313b20, 0, 0, 1) -> 0xe
> kernel: page fault trap, code=3D0
> Stopped in pid 0.1 (swapper) at netbsd:bufq_alloc+0x6b: cmpl %ebx,0x8(=
%esi)
> db> t
> bufq_alloc(c15fa9c4,22,50,2,12) at netbsd:bufq_alloc+0x6b
> fdattach(c157e200,c15fa800,c035cee0,0,c03047a0) at netbsd:fdattach+0x75
> config_attach_loc(c157e200,c02fda50,0,c035cee0,c028fa00) at
> netbsd:config_attach_loc+0x2cc
> fdcfinishattach(c157e200,c01e15f0,0,c025fda80,358000) at
> netbsd:fdcfinishattach+0x110
> config_process_deferred(c0315144,0,c0314a0,bfeff000,c02fda80) at
> netbsd:config_process_deferred+0x46
> configure(0,1,0,0,0) at netbsd:configure+0x64
> main(0,0,0,0,0) at netbsd:main+0xc9
>=20
> At first I thought this might be something to do with the floppy driver, =
so I
> removed that and tried again, same result, just at a different point in t=
he
> kernel. The hardware is fine, and the ram has been through memtest86 with=
no
> problems.
>=20
> More detail about the exact hardware configuration can be provided on req=
uest,
> but I do not think this is a hardware issue.
>=20
> Kernel config used:
> # jane.endersgame.net kernel configuration [NetBSD (current) i386]
> # AMD Athlon XP 2700+ (TBred) 2.16ghz, 256KB L2 cache, 1GB DDR333 SDRAM
> # $egnet: JANE,v 1.104 2005/10/07 10:32:31 dive Exp $
>=20
> machine i386 x86
>=20
> ident "JANE"
Typical issue of not including "conf/std". Yamamoto-san, I think it
should be better to have config(1) automatically include it, otherwise
we'll have reports like this any time we add something there.
So, as you have guessed now, your problem is that you don't have any
bufq strategy compiled into your kernel. Now, the kernel should panic
instead of segfaulting (what happens is that it ends up with
bufq_strat_dummy and tries to call its init function, which is NULL).
Moreover, fdattach() explicitely requires BUFQ_DISKSORT, so we have to
make some sort of dependency happen there.
To sum up, we have 3 bugs:
1. Segfault instead of panic. Easy enough to solve, we should test
after the loop if we got bufq_strat_dummy, and panic. Or even
make the #ifdef DEBUG block permanent and replace the printf with
a panic().
2. fdc/fd doesn't depend on BUFQ_DISKSORT in the config files. Should
be easy enough to fix.
3. Users don't know about conf/std and what to do with it. Proper way
to fix that is left as an exercise to the reader. Even though I
tend to agree with Yamamoto-san's point on having conf/std
automatically included, users who know how config(1) works will
still have a choice, while those who aren't always looking at
source-changes (can't really blame them) won't be bitten in such
ways that easily.
--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"When I find the controls, I'll go where I like, I'll know where I want
to be, but maybe for now I'll stay right here on a silent sea."
KT Tunstall, Silent Sea, Eye to the Telescope, 2004.
--PVZB3C997XEKXh6G
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)
iQEVAwUBQ0Ze/tgoQloHrPnoAQJm/gf+PXUCk4hVqZ6m3FjZE0MPnIFt/1oiOqo5
jKr0vD0YHfe2B0f6o9rfAs5tlMaa4yIUpcQIaizsHVPF/VK+f85y05lx7wzxE+F0
6uQpmD5GqBtueBUnsaQZ4hJPutXSnDx36qpB6WOnav5bFZLqKDFuTpictL+OwlKH
IB7+nyGAxqy8NUmC2jZ6ifukibMAe6lt8dTuUXZ9X7+tqyJ8NVHl5bdHSb1YzHPh
MSr08NUtDrEcaiOvTos1LENu2uH0WtieCY6VeRszulDyto7pQ6O8uRYf+X5TMVJ/
q2gzZrbh78SyH6Y2mXUCD7pi4W/l1StU82jZfIcAi8cRL14x0w0bLQ==
=aMV3
-----END PGP SIGNATURE-----
--PVZB3C997XEKXh6G--