port-sparc64: Re: crash dump failing on machine with 4GB

Subject: Re: crash dump failing on machine with 4GB
To: Chris Ross <cross+netbsd@distal.com>
From: Greg Oster <oster@cs.usask.ca>
List: port-sparc64
Date: 09/29/2007 19:51:17
Chris Ross writes:
> 
> On Sep 28, 2007, at 17:16, Juan RP wrote:
> > You are right, it's initialized in scsipi_base.c:scsipi_get_xs()...
> > but perhaps the callout was stopped previously and it wasn't  
> > reinitialized
> > or something.
> >
> > Someone with scsipi(9) clue should answer :-)
> 
>    Looks like it being called from esiop_scsicmd_end() isn't  
> necessarily mapped to a scsipi_get_xs(), at least in the case of a  
> crashdump.
> 
>    I put debugging printf()s in _init, and _stop, and in normal  
> running, am seeing lots of things like:
[snip]
> dumping to dev 7,1 offset 4310231
> dump Calling callout_stop on 0x187ea98
> callout_stop: c 0x187ea98, c_magic 0
> panic: kernel diagnostic assertion "c->c_magic == CALLOUT_MAGIC"  
> failed: file "/data/NetBSD/src/sys/kern/kern_timeout.c", line 431
> cpu0: kdb breakpoint at 141a3c0
> Stopped in pid 0.2 (system) at  netbsd:cpu_Debugger+0x4:        nop
> db>

Just as a datapoint, I'm seeing exactly the same thing: 

dumping to dev 18,9 offset 1538567
dump panic: kernel diagnostic assertion "c->c_magic == CALLOUT_MAGIC" failed: file "/u1/devel/current/src/sys/kern/kern_timeout.c", line 427
Stopped in pid 0.2 (system) at  netbsd:cpu_Debugger+0x4:        popl    %ebp
db> tr
cpu_Debugger(c098f4ff,ca84b7cc,0,0,0) at netbsd:cpu_Debugger+0x4
panic(c09fa5f0,c094ccd1,c09646b2,c09c1298,1ab) at netbsd:panic+0x155
__assert(c094ccd1,c09c1298,1ab,c09646b2,0) at netbsd:__assert+0x39
callout_stop(c0a92250,c0e8e5d0,0,200,ca84ba74) at netbsd:callout_stop+0x189
ahc_done(c0e8e400,c0ee7310,380,40,a) at netbsd:ahc_done+0xab
ahc_run_qoutfifo(c0e8e400,1,32,1,2) at netbsd:ahc_run_qoutfifo+0xc4
ahc_execute_scb(c0a14a60,c0eec200,ca84ba74,200,0) at netbsd:ahc_execute_scb+0x68f
ahc_action(c0e8e43c,0,c0a92240,0,1b7740) at netbsd:ahc_action+0x4f5
sddump(415,177a07,0,ca84ba74,200) at netbsd:sddump+0x23d
raiddump(1209,177a07,0,ca84ba74,200) at netbsd:raiddump+0x209
cpu_dump(c0969edb,12,9,177a07,1) at netbsd:cpu_dump+0xe9
dumpsys(7,20,ca84bd10,104,0) at netbsd:dumpsys+0xf4
cpu_reboot(104,0,0,c095001c,0) at netbsd:cpu_reboot+0xaf
db_reboot_cmd(0,0,c0a6656d,ca84bd44,a) at netbsd:db_reboot_cmd+0x48
db_command(c095001c,c095021c,c0b70158,0,0) at netbsd:db_command+0xc6
db_command_loop(c0507b44,0,2,c0aad201,800) at netbsd:db_command_loop+0xd8
db_trap(1,0,58,c0aaf340,32) at netbsd:db_trap+0xdf
kdb_trap(1,0,ca84bf6c,7,7) at netbsd:kdb_trap+0xde
trap() at netbsd:trap+0x275
--- trap (number 1) ---
cpu_Debugger(c0eefc00,6,ca84bfdc,c0eefc70,c0ef108e) at netbsd:cpu_Debugger+0x4
comintr(c0eefc00,ca845c0c,0,uvm_fault(0xc0a8bba0, 0xca84c000, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Faulted in DDB; continuing...
db> 

but this is on reasonably recent -current on i386 using one of these:

ahc1 at pci0 dev 10 function 0: Adaptec 2940 SCSI adapter
ahc1: interrupting at irq 9
ahc1: aic7870: Single Channel A, SCSI Id=7, 16/253 SCBs

(so whatever is causing this issue, I don't think it's limited to 
sparc64...  And I'll note that this system doesn't have 4GB or 
anything either (it's just a K6-2@350MHz w/ 128MB RAM :) ))

Later...

Greg Oster