Subject: Re: kern/31501: reproducible panics with 3.99.9/i386
To: Quentin Garnier <cube@cubidou.net>
From: Sean Davis <dive@endersgame.net>
List: tech-kern
Date: 10/07/2005 08:55:14
On Fri, Oct 07, 2005 at 01:41:50PM +0200, Quentin Garnier wrote:
> On Fri, Oct 07, 2005 at 11:05:01AM +0000, dive@endersgame.net wrote:
> > >Number: 31501
> > >Category: kern
> > >Synopsis: NetBSD 3.99.9 panics in a repeatable manner on i386
> [...]
> > fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
> > uvm_fault(0xc0313b20, 0, 0, 1) -> 0xe
> > kernel: page fault trap, code=0
> > Stopped in pid 0.1 (swapper) at netbsd:bufq_alloc+0x6b: cmpl %ebx,0x8(%esi)
> > db> t
> > bufq_alloc(c15fa9c4,22,50,2,12) at netbsd:bufq_alloc+0x6b
> > fdattach(c157e200,c15fa800,c035cee0,0,c03047a0) at netbsd:fdattach+0x75
> > config_attach_loc(c157e200,c02fda50,0,c035cee0,c028fa00) at
> > netbsd:config_attach_loc+0x2cc
> > fdcfinishattach(c157e200,c01e15f0,0,c025fda80,358000) at
> > netbsd:fdcfinishattach+0x110
> > config_process_deferred(c0315144,0,c0314a0,bfeff000,c02fda80) at
> > netbsd:config_process_deferred+0x46
> > configure(0,1,0,0,0) at netbsd:configure+0x64
> > main(0,0,0,0,0) at netbsd:main+0xc9
> >
> > At first I thought this might be something to do with the floppy driver, so I
> > removed that and tried again, same result, just at a different point in the
> > kernel. The hardware is fine, and the ram has been through memtest86 with no
> > problems.
> >
> > More detail about the exact hardware configuration can be provided on request,
> > but I do not think this is a hardware issue.
> >
> > Kernel config used:
> > # jane.endersgame.net kernel configuration [NetBSD (current) i386]
> > # AMD Athlon XP 2700+ (TBred) 2.16ghz, 256KB L2 cache, 1GB DDR333 SDRAM
> > # $egnet: JANE,v 1.104 2005/10/07 10:32:31 dive Exp $
> >
> > machine i386 x86
> >
> > ident "JANE"
>
> Typical issue of not including "conf/std". Yamamoto-san, I think it
> should be better to have config(1) automatically include it, otherwise
> we'll have reports like this any time we add something there.
Oh. Whoops. I've fallen into the (obviously, now, bad) habit of handrolling
config files using only what I need, without bothering with including
conf/std. I've included it now, assuming it boots (which it should), I'll
reply so you can close the bug.
> So, as you have guessed now, your problem is that you don't have any
> bufq strategy compiled into your kernel. Now, the kernel should panic
> instead of segfaulting (what happens is that it ends up with
> bufq_strat_dummy and tries to call its init function, which is NULL).
>
> Moreover, fdattach() explicitely requires BUFQ_DISKSORT, so we have to
> make some sort of dependency happen there.
>
> To sum up, we have 3 bugs:
>
> 1. Segfault instead of panic. Easy enough to solve, we should test
> after the loop if we got bufq_strat_dummy, and panic. Or even
> make the #ifdef DEBUG block permanent and replace the printf with
> a panic().
>
> 2. fdc/fd doesn't depend on BUFQ_DISKSORT in the config files. Should
> be easy enough to fix.
>
> 3. Users don't know about conf/std and what to do with it. Proper way
> to fix that is left as an exercise to the reader. Even though I
> tend to agree with Yamamoto-san's point on having conf/std
> automatically included, users who know how config(1) works will
> still have a choice, while those who aren't always looking at
> source-changes (can't really blame them) won't be bitten in such
> ways that easily.
>
> --
> Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
> "When I find the controls, I'll go where I like, I'll know where I want
> to be, but maybe for now I'll stay right here on a silent sea."
> KT Tunstall, Silent Sea, Eye to the Telescope, 2004.