Subject: bootable RAID-1 array problems
To: None <port-alpha@netbsd.org>
From: Ray Phillips <r.phillips@jkmrc.com>
List: port-alpha
Date: 08/20/2004 12:44:54
Recently I've tried unsuccessfully to setup a RAID1 array following
the instructions at http://www.netbsd.org/guide/en/chap-rf.html.
First I used a -current system built from CVS sources updated on 26
July, then again with one built from an update done on 17 August.
I completed this successfully on an i386 machine (using the 26 July
sources) which worked, so I wonder if there's an alpha-specific
problem or (more likely) if I've done something wrong?
Things seem to go OK until
# raidctl -F component0 raid0
is executed in section 23.8 The first boot with RAID-1, at which
point the machine (a PWS 500 with 1 GB RAM) hangs. I've tried using
both a pair of identical SCSI and a pair of identical IDE disks. The
console output for the SCSI pair at the point of the crash was:
RECON: initiating reconstruction on col 0 -> spare at col 2
sd1(isp0:0:2:0): Check Condition on CDB: 0x08 00 10 40 80 00
SENSE KEY: Hardware Error
ASC/ASCQ: ASC 0x44 ASCQ 0x9d
raid0: IO Error. Marking /dev/sd1a as failed.
raid0: Recon read failed!
panic: raidframe error at line 880 file
/usr/src/sys/dev/raidframe/rf_reconstruc
Stopped in pid 504.1 (raid_recon) at netbsd:cpu_Debugger+0x4:
ret z
ero,(ra)
db>
and for the IDE pair:
Aug 19 17:11:41 www /netbsd: stray isa irq 14
Warning: truncating spare disk /dev/wd0a to 4127616 blocks
Aug 19 17:12:50 www su: ray to root on /dev/ttyp1
RECON: initiating reconstruction on col 0 -> spare at col 2
wd1a: error reading fsbn 1031488 of 1031488-1031615 (wd1 bn 1031488;
cn 1023 tng
wd1: (uncorrectable data error)
wd0a: channel reset writing fsbn 1031232 of 1031232-1031359 (wd0 bn
1031232; cng
wd1a: error reading fsbn 1031488 of 1031488-1031615 (wd1 bn 1031488;
cn 1023 tng
wd1: (uncorrectable data error)
wd0a: channel reset writing fsbn 1031232 of 1031232-1031359 (wd0 bn
1031232; cng
wd1a: error reading fsbn 1031488 of 1031488-1031615 (wd1 bn 1031488;
cn 1023 tng
wd1: (uncorrectable data error)
wd0a: channel reset writing fsbn 1031232 of 1031232-1031359 (wd0 bn
1031232; cng
wd1a: error reading fsbn 1031488 of 1031488-1031615 (wd1 bn 1031488;
cn 1023 tng
wd1: (uncorrectable data error)
wd0a: channel reset writing fsbn 1031232 of 1031232-1031359 (wd0 bn
1031232; cng
wd1a: error reading fsbn 1031581 of 1031488-1031615 (wd1 bn 1031581;
cn 1023 tng
wd1: (uncorrectable data error)
wd0a: channel reset writing fsbn 1031232 of 1031232-1031359 (wd0 bn
1031232; cng
wd1a: error reading fsbn 1031581 of 1031488-1031615 (wd1 bn 1031581;
cn 1023 tn)
raid0: IO Error. Marking /dev/wd1a as failed.
raid0: Recon read failed!
panic: raidframe error at line 880 file
/usr/src/sys/dev/raidframe/rf_reconstruc
Stopped in pid 445.1 (raid_recon) at netbsd:cpu_Debugger+0x4:
ret z
ero,(ra)
db>
(I'm afraid long lines don't wrap on the console I used, so they're truncated.)
I suppose I was asking for trouble in the second case since wd1 has ~
63 K of bad sectors, but I'm pretty sure they were in the swap
patition so I thought they wouldn't be relevant. I've no reason to
think there was a hardware problem with the SCSI setup.
dmesg probed them as:
sd0 at scsibus0 target 0 lun 0: <DEC, RZ1CF-AF (C) DEC, 1614> disk fixed
sd0: async, 8-bit transfers
sd0: 4091 MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
sd0: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 2 lun 0: <DEC, RZ1CF-AF (C) DEC, 1614> disk fixed
sd1: async, 8-bit transfers
sd1: 4091 MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
sd1: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers, tagged queueing
The disklabels I used were:
sd0 and sd1
-----------
# size offset fstype [fsize bsize cpg/sgs]
a: 8380080 0 RAID
b: 1048576 262208 swap
c: 8380080 0 unused 0 0
raid0
-----
# size offset fstype [fsize bsize cpg/sgs]
a: 262144 0 4.2BSD 0 0 0
b: 1048576 262144 swap
c: 8379904 0 unused 0 0 0
d: 7069184 1310720 4.2BSD 0 0 0
/etc/fstab
----------
/dev/raid0a / ffs rw 1 1
/dev/raid0b none swap sw 0 0
/dev/raid0d /usr ffs rw 1 2
/dev/sd0b none swap dp 0 0
kernfs /kern kernfs rw
procfs /proc procfs rw,noauto
To install boot blocks I used:
# /usr/sbin/installboot -v /dev/rsd1c /usr/mdec/bootxx_ffs
Can you see anything awry with any of this?
Ray