Subject: Re: bootable RAID-1 array problems
To: Ray Phillips <r.phillips@jkmrc.com>
From: Greg Oster <oster@cs.usask.ca>
List: port-alpha
Date: 08/19/2004 22:20:18
Ray Phillips writes:
> Recently I've tried unsuccessfully to setup a RAID1 array following 
> the instructions at http://www.netbsd.org/guide/en/chap-rf.html. 
> First I used a -current system built from CVS sources updated on 26 
> July, then again with one built from an update done on 17 August.
> 
> I completed this successfully on an i386 machine (using the 26 July 
> sources) which worked, so I wonder if there's an alpha-specific 
> problem or (more likely) if I've done something wrong?

Nope....  This:

> The 
> console output for the SCSI pair at the point of the crash was:
> 
> RECON: initiating reconstruction on col 0 -> spare at col 2
> sd1(isp0:0:2:0):  Check Condition on CDB: 0x08 00 10 40 80 00
>      SENSE KEY:  Hardware Error
>       ASC/ASCQ:  ASC 0x44 ASCQ 0x9d
> 
> raid0: IO Error.  Marking /dev/sd1a as failed.
> raid0: Recon read failed!

and:

> and for the IDE pair:
> 
> Aug 19 17:11:41 www /netbsd: stray isa irq 14
> Warning: truncating spare disk /dev/wd0a to 4127616 blocks
> Aug 19 17:12:50 www su: ray to root on /dev/ttyp1
> RECON: initiating reconstruction on col 0 -> spare at col 2
> wd1a: error reading fsbn 1031488 of 1031488-1031615 (wd1 bn 1031488; 
> cn 1023 tng
> wd1: (uncorrectable data error)
[snip]
> 
> raid0: IO Error.  Marking /dev/wd1a as failed.
> raid0: Recon read failed!

indicate hardware errors, and right now the reconstruction code 
in RAIDframe doesn't deal at all with those sorts of errors.

> I suppose I was asking for trouble in the second case since wd1 has ~ 
> 63 K of bad sectors, but I'm pretty sure they were in the swap 
> patition so I thought they wouldn't be relevant.  I've no reason to 
> think there was a hardware problem with the SCSI setup.
[snip]

I can't tell from the error, but is it possible you fell off the end 
of the SCSI disk while doing the reconstruct?  Short of a real error, 
that's the only other thing I can think of right now.  (You should have 
seen the error with doing the '-i' initialization too, if that was 
the case.  Was the parity of the sets "clean" when you started the 
reconstruct? )

Later...

Greg Oster