Subject: RAIDframe crash
To: None <current-users@netbsd.org>
From: Chris Jones <chris@cjones.org>
List: current-users
Date: 05/08/2001 16:50:41
--2/5bycvrmDh4d1IB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Three problems:

1. Panic, apparently having something to do with RAIDframe.

2. How to recover the RAID data after said panic.

3. I can't get savecore to do its thing and give me a core dump.

Problem 1:  Here's partial /var/log/messages output.  At the time of
the crash, I was copying a bunch of data onto a partition on raid1
(RAID 5) from an NFS mount.

May  7 16:02:01 gamera /netbsd: NetBSD 1.5.1_ALPHA (GAMERA) #0: Tue Jan 23 =
11:43:12 MST 2001
May  7 16:02:01 gamera /netbsd:     chris@legolas.mt.sri.com:/usr/local/tar=
pit/netbsd/syssrc/sys/arch/i386/compile/GAMERA
May  7 16:02:02 gamera /netbsd: fxp0 at pci0 dev 10 function 0: Intel i8255=
7 Ethernet, rev 8
May  7 16:02:02 gamera /netbsd: fxp0: interrupting at irq 9
May  7 16:02:02 gamera /netbsd: fxp0: Ethernet address 00:90:27:87:b0:7e, 1=
0/100 Mb/s
May  7 16:02:02 gamera /netbsd: inphy0 at fxp0 phy 1: i82555 10/100 media i=
nterface, rev. 4
May  7 16:02:02 gamera /netbsd: inphy0: 10baseT, 10baseT-FDX, 100baseTX, 10=
0baseTX-FDX, auto
May  7 16:02:02 gamera /netbsd: siop1 at pci0 dev 11 function 0: Symbios Lo=
gic 53c875 (ultra-wide scsi)
May  7 16:02:02 gamera /netbsd: siop1: using on-board RAM
May  7 16:02:02 gamera /netbsd: siop1: interrupting at irq 10
May  7 16:02:02 gamera /netbsd: scsibus1 at siop1: 16 targets, 8 luns per t=
arget
May  7 16:02:02 gamera /netbsd: siop0 at pci0 dev 14 function 0: Symbios Lo=
gic 53c875j (ultra-wide scsi)
May  7 16:02:02 gamera /netbsd: siop0: using on-board RAM
May  7 16:02:02 gamera /netbsd: siop0: interrupting at irq 9
May  7 16:02:02 gamera /netbsd: scsibus0 at siop0: 16 targets, 8 luns per t=
arget
May  7 16:02:03 gamera /netbsd: scsibus1: waiting 2 seconds for devices to =
settle...
May  7 16:02:03 gamera /netbsd: siop1: target 0 using 8bit transfers=20
May  7 16:02:03 gamera /netbsd: siop1: target 0 now synchronous at 20.0Mhz,=
 offset 16
May  7 16:02:03 gamera /netbsd: siop1: target 0 using tagged queuing
May  7 16:02:03 gamera /netbsd: sd2 at scsibus1 target 0 lun 0: <IBM, DMVS3=
6D, 0210> SCSI3 0/direct fixed
May  7 16:02:03 gamera /netbsd: siop1: target 0 using 16bit transfers
May  7 16:02:03 gamera /netbsd: siop1: target 0 now synchronous at 20.0Mhz,=
 offset 16
May  7 16:02:03 gamera /netbsd: sd2: 35003 MB, 11739 cyl, 20 head, 305 sec,=
 512 bytes/sect x 71687340 sectors
May  7 16:02:03 gamera /netbsd: siop1: target 1 using 8bit transfers
May  7 16:02:03 gamera /netbsd: siop1: target 1 now synchronous at 20.0Mhz,=
 offset 16
May  7 16:02:03 gamera /netbsd: siop1: target 1 using tagged queuing
May  7 16:02:03 gamera /netbsd: sd3 at scsibus1 target 1 lun 0: <IBM, DMVS3=
6D, 0210> SCSI3 0/direct fixed
May  7 16:02:03 gamera /netbsd: siop1: target 1 using 16bit transfers
May  7 16:02:03 gamera /netbsd: siop1: target 1 now synchronous at 20.0Mhz,=
 offset 16
May  7 16:02:03 gamera /netbsd: sd3: 35003 MB, 11739 cyl, 20 head, 305 sec,=
 512 bytes/sect x 71687340 sectors
May  7 16:02:03 gamera /netbsd: st0 at scsibus1 target 5 lun 0: <HP, C1557A=
, U812> SCSI2 1/sequential removable
May  7 16:02:03 gamera /netbsd: st0: siop1: target 5 now synchronous at 10.=
0Mhz, offset 16
May  7 16:02:03 gamera /netbsd: density code 37, variable blocks, write-ena=
bled
May  7 16:02:04 gamera /netbsd: ch0 at scsibus1 target 5 lun 1: <HP, C1557A=
, U812> SCSI2 8/changer removable
May  7 16:02:04 gamera /netbsd: ch0: 6 slots, 1 drive, 0 pickers, 0 portals
May  7 16:02:04 gamera /netbsd: scsibus0: waiting 2 seconds for devices to =
settle...
May  7 16:02:04 gamera /netbsd: siop0: target 0 using tagged queuing=20
May  7 16:02:04 gamera /netbsd: sd0 at scsibus0 target 0 lun 0: <SEAGATE, S=
T34572W, 0784> SCSI2 0/direct fixed
May  7 16:02:04 gamera /netbsd: siop0: target 0 using 16bit transfers
May  7 16:02:04 gamera /netbsd: siop0: target 0 now synchronous at 20.0Mhz,=
 offset 15
May  7 16:02:04 gamera /netbsd: sd0: 4340 MB, 6300 cyl, 8 head, 176 sec, 51=
2 bytes/sect x 8888924 sectors
May  7 16:02:04 gamera /netbsd: siop0: target 1 using tagged queuing
May  7 16:02:04 gamera /netbsd: sd1 at scsibus0 target 1 lun 0: <SEAGATE, S=
T32272W, 0784> SCSI2 0/direct fixed
May  7 16:02:04 gamera /netbsd: siop0: target 1 using 16bit transfers
May  7 16:02:04 gamera /netbsd: siop0: target 1 now synchronous at 20.0Mhz,=
 offset 15
May  7 16:02:04 gamera /netbsd: sd1: 2157 MB, 6300 cyl, 4 head, 175 sec, 51=
2 bytes/sect x 4419464 sectors
May  8 16:10:46 gamera /netbsd: sd2(siop1:0:0): command timeout
May  8 16:10:46 gamera /netbsd: siop1: scsi bus reset
May  8 16:10:46 gamera /netbsd: cmd 0xc0670240 (target 0:0) in reset list
May  8 16:10:46 gamera /netbsd: cmd 0xc0670100 (target 0:0) in reset list
May  8 16:10:46 gamera /netbsd: cmd 0xc06701c0 (target 0:0) in reset list
May  8 16:10:46 gamera /netbsd: cmd 0xc0670300 (target 0:0) in reset list
May  8 16:10:46 gamera /netbsd: cmd 0xc0670080 (target 1:0) in reset list
May  8 16:10:46 gamera /netbsd: cmd 0xc06702c0 (target 1:0) in reset list
May  8 16:10:46 gamera /netbsd: cmd 0xc0670000 (target 1:0) in reset list
May  8 16:10:46 gamera /netbsd: cmd 0xc0670280 (target 1:0) in reset list
May  8 16:10:46 gamera /netbsd: cmd 0xc0670240 (status 2) about to be proce=
ssed
May  8 16:10:46 gamera /netbsd: cmd 0xc0670100 (status 2) about to be proce=
ssed
May  8 16:10:46 gamera /netbsd: cmd 0xc06701c0 (status 2) about to be proce=
ssed
May  8 16:10:46 gamera /netbsd: cmd 0xc0670300 (status 2) about to be proce=
ssed
May  8 16:10:46 gamera /netbsd: cmd 0xc0670080 (status 2) about to be proce=
ssed
May  8 16:10:46 gamera /netbsd: cmd 0xc06702c0 (status 2) about to be proce=
ssed
May  8 16:10:46 gamera /netbsd: cmd 0xc0670000 (status 2) about to be proce=
ssed
May  8 16:10:46 gamera /netbsd: cmd 0xc0670280 (status 2) about to be proce=
ssed
May  8 16:10:46 gamera /netbsd: siop1: target 0 using 16bit transfers
May  8 16:10:46 gamera /netbsd: siop1: target 0 now synchronous at 20.0Mhz,=
 offset 16
May  8 16:10:46 gamera /netbsd: siop1: target 1 using 16bit transfers
May  8 16:10:46 gamera /netbsd: siop1: target 1 now synchronous at 20.0Mhz,=
 offset 16
May  8 16:10:47 gamera /netbsd: siop1: unexpected phase mismatch 6
May  8 16:10:47 gamera /netbsd: sd3(siop1:1:0): parity error
May  8 16:10:47 gamera /netbsd: siop1: scsi bus reset
May  8 16:10:47 gamera /netbsd: cmd 0xc06700c0 (target 0:0) in reset list

=2E..and then it crashed.  The console had some message about RAIDframe
being unable to allocate a DAG.  I didn't write it down or get a
backtrace, because I knew it would make a core dump.  :-/

Problem 2:  I'd like to get raid1 back up again, but it won't
configure:
May  8 16:38:31 gamera /netbsd: raidlookup on device: /dev/sd4e failed!
May  8 16:38:31 gamera /netbsd: Hosed component: /dev/sd4e
May  8 16:38:31 gamera /netbsd: raid1: Too many different mod counters!
May  8 16:38:31 gamera /netbsd: raid1: Component /dev/sd2e being configured=
 at row: 0 col: 0
May  8 16:38:31 gamera /netbsd:          Row: 0 Column: 0 Num Rows: 1 Num C=
olumns: 3
May  8 16:38:31 gamera /netbsd:          Version: 2 Serial Number: 20010507=
00 Mod Counter: 35
May  8 16:38:31 gamera /netbsd:          Clean: No Status: 0
May  8 16:38:31 gamera /netbsd: /dev/sd2e is not clean!
May  8 16:38:31 gamera /netbsd: raid1: Component /dev/sd3e being configured=
 at row: 0 col: 1
May  8 16:38:31 gamera /netbsd:          Row: 0 Column: 1 Num Rows: 1 Num C=
olumns: 3
May  8 16:38:31 gamera /netbsd:          Version: 2 Serial Number: 20010507=
00 Mod Counter: 36
May  8 16:38:31 gamera /netbsd:          Clean: No Status: 0
May  8 16:38:31 gamera /netbsd: /dev/sd3e has a different modfication count=
: 35 36
May  8 16:38:31 gamera /netbsd: /dev/sd3e is not clean!
May  8 16:38:31 gamera /netbsd: raid1: Ignoring /dev/sd4e
May  8 16:38:31 gamera /netbsd: raid1: There were fatal errors
May  8 16:38:31 gamera /netbsd: Closing vnode for row: 0 col: 0
May  8 16:38:31 gamera /netbsd: Closing vnode for row: 0 col: 1
May  8 16:38:31 gamera /netbsd: Closing vnode for row: 0 col: 2
May  8 16:38:31 gamera /netbsd: vnode was NULL
May  8 16:38:31 gamera /netbsd: RAIDFRAME: failed rf_ConfigureDisks with 22
May  8 16:38:31 gamera /netbsd: Closing vnode for row: 0 col: 0
May  8 16:38:31 gamera /netbsd: vnode was NULL
May  8 16:38:31 gamera /netbsd: Closing vnode for row: 0 col: 1
May  8 16:38:31 gamera /netbsd: vnode was NULL
May  8 16:38:31 gamera /netbsd: Closing vnode for row: 0 col: 2
May  8 16:38:31 gamera /netbsd: vnode was NULL

Now I could certainly force it to configure, in spite of the different
mod counters, but I'd rather wait on advice from persons more
knowledgeable.

Problem 3:  After reboot, I didn't get a core dump.  My fstab has the
line:
/dev/sd0b none swap dp 0 0
=2E..which means to use /dev/sd0b as the dump device -- don't know if
that has anything to do with it.  During boot, savecore runs before
swapctl, meaning that the swap partition hasn't been set yet when
savecore is run.  Looking through savecore.c, it appears that it reads
the kernel symbols to determine where the dump device is -- this seems
wrong; shouldn't it read fstab instead of or in addition to the
kernel?  At any rate, the ordering is definitely wrong.  After I've
booted and set the dump device, though:

gamera# swapctl -D /dev/sd0b
swapctl: setting dump device to /dev/sd0b
gamera# savecore -f -v -z /var/crash
dumplo =3D 0 (0 * 512)
savecore: no core dump

=2E..so I can't get a core dump or backtrace to help with problem number
1, above.

Any help in solving any of these three problems would be much
appreciated, and might even contribute to the other two.  :)

Chris

--=20
---------------------------------------------------- chris@cjones.org
Chris Jones                                          Mad scientist at large
  www.netbsd.org www.postgresql.org www.schemers.org www.python.org

--2/5bycvrmDh4d1IB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (NetBSD)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjr4eEEACgkQDPY2T8RzaD+i5wCdE0Lr5W01cTFLE900uMkHm/Up
qP4An2wbaE5dTT+EN3rC1+Z4ODfpUNMO
=4t/Y
-----END PGP SIGNATURE-----

--2/5bycvrmDh4d1IB--