port-sparc64: Re: crash dump failing on machine with 4GB

Subject: Re: crash dump failing on machine with 4GB
To: matthew green <mrg@eterna.com.au>
From: Chris Ross <cross+netbsd@distal.com>
List: port-sparc64
Date: 09/27/2007 23:38:30
On Sep 27, 2007, at 20:04, matthew green wrote:
>> dumping to dev 7,1 offset 4310231
>> dump 4096 esiop0: unable to load cmd DMA map: -1i/o error
>> sd0(esiop0:0:0:0): polling command not done
>> panic: scsipi_execute_xs
>> cpu0: kdb breakpoint at 13f3e80
>> Stopped in pid 0.2 (system) at  netbsd:cpu_Debugger+0x4:        nop
>> db>
>
>       So, does anyone have any suggestions on where I should go from
>    here?  I looked into the "unable to load cmd DMA map" error,  
> which is
>    returning an EIO from a call to bus_dmamap_load().  Should I try to
>    track down into that function (via the macro, etc) and figure  
> out if
>    it's returning an EIO for some reason relating to the physical  
> memory
>    address it's given?  Or, can someone look at the code in  
> doshutdown()
>    to see if the physical memory mapping calls "look right"?  I was
>    looking at amd64, figuring that it would be more likely to have  
> this
>    functionality working, and I notice that the pmap_* call(s) it uses
>    are different, but that may not be unusual...
>
> you mean it's bus_dmamap_load() is different?  yeah, that is gonna
> be expected..

   No, actually I meant the pmap_* calls in the dumpsys code in the  
respective
machdep.c's are different.  amd64 uses pmap_map() before calling the
bdev's dump function.  the sparc64 code is using pmap_kenter_pa()
followed by pmap_update().  Again, this isn't code I know anything  
about,
I was just randomly asking if it was expected that different pmap*()
routines would be used on the two architectures.

   The bus_dmamap_load() call, in dev/ic/esiop.c, was just what I found
when looking at where that printed out string came from...

> hmm, i don't see how sparc64 bus_dmamap_load() could return EIO?
> see machdep.c:_bus_dmamap_load().  oh, the message above says it
> returns -1... which also seems not possible...
>
> is the above text exactly what it says?  i don't see where the
> "i/o error" comes from?  there should be a newline after the -1.
> (perhaps you changed this?)

   I hadn't chased down where the code that becomes bus_dmamap_load()
on the sparc64 was yet.  That's what I was asking about.  I see that  
it's a
macro in bus.h, but hadn't tracked it to real code past that.

   The text is what it says.  The "i/o error" string is after the  
call to the bdevsw's
dump function in dumpsys(), in machdep.c.  It's the return from the dump
function, which is the d_dump member of the bdevsw for the device
to be dumped to, I think.  That's what's returning the EIO.

   Sorry to have confused things by asking about two different things in
the same paragraph.  :-)  I *presume* that the dump function is  
returning
the EIO *because* the underlying bus_dmamap_load() is failing, inside
of the esiop code.  But, again, I hadn't tracked that out yet.

>>    ps,
>>       Is the last argument to sd_flush(), quoted in the backtrace  
>> above,
>>    indicative of a problem?  Just looks "odd" compared to the rest of
>>    the parameters.
>
> it is just garbage on the stack.  looking at sd.c:
>
> 	static int sd_flush(struct sd_softc *, int);
>
> so only the first two arguments are relevant.

   Ahh.  Cool.  Thanks.  That was information I needed.  :-)  That makes
much more sense now...

                                                   - Chris