Subject: Re: kern/31501: reproducible panics with 3.99.9/i386
To: None <>
From: Quentin Garnier <>
List: tech-kern
Date: 10/07/2005 13:41:50
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Oct 07, 2005 at 11:05:01AM +0000, wrote:
> >Number:         31501
> >Category:       kern
> >Synopsis:       NetBSD 3.99.9 panics in a repeatable manner on i386
> fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
> uvm_fault(0xc0313b20, 0, 0, 1) -> 0xe
> kernel: page fault trap, code=3D0
> Stopped in pid 0.1 (swapper) at netbsd:bufq_alloc+0x6b: cmpl    %ebx,0x8(=
> db> t
> bufq_alloc(c15fa9c4,22,50,2,12) at netbsd:bufq_alloc+0x6b
> fdattach(c157e200,c15fa800,c035cee0,0,c03047a0) at netbsd:fdattach+0x75
> config_attach_loc(c157e200,c02fda50,0,c035cee0,c028fa00) at
> netbsd:config_attach_loc+0x2cc
> fdcfinishattach(c157e200,c01e15f0,0,c025fda80,358000) at
> netbsd:fdcfinishattach+0x110
> config_process_deferred(c0315144,0,c0314a0,bfeff000,c02fda80) at
> netbsd:config_process_deferred+0x46
> configure(0,1,0,0,0) at netbsd:configure+0x64
> main(0,0,0,0,0) at netbsd:main+0xc9
> At first I thought this might be something to do with the floppy driver, =
so I
> removed that and tried again, same result, just at a different point in t=
> kernel. The hardware is fine, and the ram has been through memtest86 with=
> problems.
> More detail about the exact hardware configuration can be provided on req=
> but I do not think this is a hardware issue.
> Kernel config used:
> # kernel configuration [NetBSD (current) i386]
> # AMD Athlon XP 2700+ (TBred) 2.16ghz, 256KB L2 cache, 1GB DDR333 SDRAM
> # $egnet: JANE,v 1.104 2005/10/07 10:32:31 dive Exp $
> machine		i386	x86
> ident		"JANE"

Typical issue of not including "conf/std".  Yamamoto-san, I think it
should be better to have config(1) automatically include it, otherwise
we'll have reports like this any time we add something there.

So, as you have guessed now, your problem is that you don't have any
bufq strategy compiled into your kernel.  Now, the kernel should panic
instead of segfaulting (what happens is that it ends up with
bufq_strat_dummy and tries to call its init function, which is NULL).

Moreover, fdattach() explicitely requires BUFQ_DISKSORT, so we have to
make some sort of dependency happen there.

To sum up, we have 3 bugs:

 1.  Segfault instead of panic.  Easy enough to solve, we should test
     after the loop if we got bufq_strat_dummy, and panic.  Or even
     make the #ifdef DEBUG block permanent and replace the printf with
     a panic().

 2.  fdc/fd doesn't depend on BUFQ_DISKSORT in the config files.  Should
     be easy enough to fix.

 3.  Users don't know about conf/std and what to do with it.  Proper way
     to fix that is left as an exercise to the reader.  Even though I
     tend to agree with Yamamoto-san's point on having conf/std
     automatically included, users who know how config(1) works will
     still have a choice, while those who aren't always looking at
     source-changes (can't really blame them) won't be bitten in such
     ways that easily.

Quentin Garnier - -
"When I find the controls, I'll go where I like, I'll know where I want
to be, but maybe for now I'll stay right here on a silent sea."
KT Tunstall, Silent Sea, Eye to the Telescope, 2004.

Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.6 (NetBSD)