Subject: Re: kern/31501: reproducible panics with 3.99.9/i386
To: Quentin Garnier <cube@cubidou.net>
From: Sean Davis <dive@endersgame.net>
List: tech-kern
Date: 10/07/2005 08:55:14
On Fri, Oct 07, 2005 at 01:41:50PM +0200, Quentin Garnier wrote:
> On Fri, Oct 07, 2005 at 11:05:01AM +0000, dive@endersgame.net wrote:
> > >Number:         31501
> > >Category:       kern
> > >Synopsis:       NetBSD 3.99.9 panics in a repeatable manner on i386
> [...]
> > fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
> > uvm_fault(0xc0313b20, 0, 0, 1) -> 0xe
> > kernel: page fault trap, code=0
> > Stopped in pid 0.1 (swapper) at netbsd:bufq_alloc+0x6b: cmpl    %ebx,0x8(%esi)
> > db> t
> > bufq_alloc(c15fa9c4,22,50,2,12) at netbsd:bufq_alloc+0x6b
> > fdattach(c157e200,c15fa800,c035cee0,0,c03047a0) at netbsd:fdattach+0x75
> > config_attach_loc(c157e200,c02fda50,0,c035cee0,c028fa00) at
> > netbsd:config_attach_loc+0x2cc
> > fdcfinishattach(c157e200,c01e15f0,0,c025fda80,358000) at
> > netbsd:fdcfinishattach+0x110
> > config_process_deferred(c0315144,0,c0314a0,bfeff000,c02fda80) at
> > netbsd:config_process_deferred+0x46
> > configure(0,1,0,0,0) at netbsd:configure+0x64
> > main(0,0,0,0,0) at netbsd:main+0xc9
> > 
> > At first I thought this might be something to do with the floppy driver, so I
> > removed that and tried again, same result, just at a different point in the
> > kernel. The hardware is fine, and the ram has been through memtest86 with no
> > problems.
> > 
> > More detail about the exact hardware configuration can be provided on request,
> > but I do not think this is a hardware issue.
> > 
> > Kernel config used:
> > # jane.endersgame.net kernel configuration [NetBSD (current) i386]
> > # AMD Athlon XP 2700+ (TBred) 2.16ghz, 256KB L2 cache, 1GB DDR333 SDRAM
> > # $egnet: JANE,v 1.104 2005/10/07 10:32:31 dive Exp $
> > 
> > machine		i386	x86
> > 
> > ident		"JANE"
> 
> Typical issue of not including "conf/std".  Yamamoto-san, I think it
> should be better to have config(1) automatically include it, otherwise
> we'll have reports like this any time we add something there.

Oh. Whoops. I've fallen into the (obviously, now, bad) habit of handrolling
config files using only what I need, without bothering with including
conf/std. I've included it now, assuming it boots (which it should), I'll
reply so you can close the bug.

> So, as you have guessed now, your problem is that you don't have any
> bufq strategy compiled into your kernel.  Now, the kernel should panic
> instead of segfaulting (what happens is that it ends up with
> bufq_strat_dummy and tries to call its init function, which is NULL).
> 
> Moreover, fdattach() explicitely requires BUFQ_DISKSORT, so we have to
> make some sort of dependency happen there.
> 
> To sum up, we have 3 bugs:
> 
>  1.  Segfault instead of panic.  Easy enough to solve, we should test
>      after the loop if we got bufq_strat_dummy, and panic.  Or even
>      make the #ifdef DEBUG block permanent and replace the printf with
>      a panic().
> 
>  2.  fdc/fd doesn't depend on BUFQ_DISKSORT in the config files.  Should
>      be easy enough to fix.
> 
>  3.  Users don't know about conf/std and what to do with it.  Proper way
>      to fix that is left as an exercise to the reader.  Even though I
>      tend to agree with Yamamoto-san's point on having conf/std
>      automatically included, users who know how config(1) works will
>      still have a choice, while those who aren't always looking at
>      source-changes (can't really blame them) won't be bitten in such
>      ways that easily.
> 
> -- 
> Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
> "When I find the controls, I'll go where I like, I'll know where I want
> to be, but maybe for now I'll stay right here on a silent sea."
> KT Tunstall, Silent Sea, Eye to the Telescope, 2004.