Subject: MP kernel + RAIDframe vs. crash dumps
To: None <port-i386@netbsd.org>
From: Jeff Rizzo <riz@redcrowgroup.com>
List: port-i386
Date: 07/11/2004 12:53:18
There's a PR I opened a week or so ago (kern/26187) that I've been requested
to get a crash dump for should it happen again.  The problem is, I've never
been able to successfully get one from that machine (or any of a few
others I have with identical hardware configuration), because when
I'm in ddb and type 'sync', it just hangs after 'syncing disks...'

login: ~#Stopped at      netbsd:breakpoint+0x4:  leave
db{0}> sync
syncing disks...

Now that I have a need for a crash dump, I've looked into it a little more.
I can consistently get a crash dump from a test system if and only if
I use the GENERIC kernel instead of GENERIC.MP.  (the kernel on the system
I want a dump from uses a custom MP kernel, but I can duplicate this 'sync
hangs' problem with either the custom or GENERIC.MP kernels)

The only other unusual thing about this machine is that the boot
disk is a RAID1, using RAIDframe.   When I tested on the same machine,
with the same GENERIC.MP kernel, but a non-RAID1 disk, I got this:

Stopped at      netbsd:breakpoint+0x4:  leave
db{0}>
db{0}> sync
syncing disks... 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 giving up

dumping to dev 0,1 offset 2216
dump panic: wddump: polled command has been queued
Stopped at      netbsd:breakpoint+0x4:  leave
db{0}>
db{0}> sync

dumping to dev 0,1 offset 2216
dump device not ready


wd0: flush cache command didn't complete
rebooting...

it then rebooted normally, which seems preferable to hanging.   The
*second* time I tried it on this machine, I got the hang, same as with the
RAID disk.

So, it seems to me that we have a problem with crash dumps on NetBSD/i386
with MP kernels... was this known?  I didn't see anything in the PR database...
should I send-pr this?  Which part, exactly?  :)

Thanks for input,
+j

-- 
Jeff Rizzo                                         http://www.redcrowgroup.com/