Subject: Re: DOM0 Page fault trap in NetBSD 3.0
To: TlorD <tld@tld.digitalcurse.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 03/09/2006 21:59:43
On Thu, Mar 09, 2006 at 01:00:21PM +0100, TlorD wrote:
> Manuel Bouyer wrote:
> > On Wed, Mar 08, 2006 at 12:07:43AM +0100, TLorD wrote:
> >>> It's probably dereferencing a NULL pointer but we need to find which one.
> >>> In my local build aac_intr+0x23 is line 790 in sys/dev/ic/aac.c
> >>> but it may not match your kernel (at last at first glance I can't see why
> >>> this would cause a fault).
> >> I'd love to, I just need to know how.
> >> Do I have to send you the compiled kernel? Recompile with debug information
> >> (will take ages to copy :P )?
> >> As long as I can do that, I'll gladly help.
> >> One thing I can safely say is that the halt happens at a later stage than aac0
> >> attachment (it attaches the network cards after that and before the crash), so
> >> I guess it's something interrupt-related, possibly the ld0 recognition (but
> >> since I don't know how to scrollback, I'm quite helpless at that).
> > 
> > I think you just need to add a few printfs in the kernel, in
> > the aac_intr() function. Maybe just adding
> > #define AAC_DEBUG 0xff
> > at the top of aacvar.h would be enouth for starting.
> > 
> 
> Been there, done that.
> dev/ic/aac.c lines 1440 and 1441 protest against %16D and a ', " "' too
> many;
> dev/ic/ld_aac.c line 188 protests against a parameter type mismatch.
> 
> I bluntly replaced %16D with %16llx and deferenced the pointer (I guess
> it's wrong but I have no better idea), and forced a type cast on the
> parameter.
> 
> Anyway, it compiled.
> 
> The dmesg log goes like this:
> aac0 at pci1 dev 3 function 0: pci command status reg 0x08x Adaptec
> ASR-2410SA
> aac0: interrupting at irq 20, event channel 11
> set hardware up for i960Rxaac0: error establishing init structure
> fxp0 at ...........
> fx0: interrupting ..............
> fxp0: ethernet add...............
> inphy0 at fxp0.....
> inphy0: .....
> (gdb kicks in here)
> 
> of particular interest the message "aac0: error establishing init
> structure" after the i960Rx initialisation.

Yes, and someone reported the same problem on tech-kern.
I suspect an issue in the bus_dma code, but I fail to see where.
This is with 3.0, right ? Could you try a current kernel, just in case it
would have been fixed here ?

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--