Subject: Re: Marking failed RAID 0 drive as not failed without replacement?
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Greg Oster <oster@cs.usask.ca>
List: port-alpha
Date: 09/22/2002 09:22:39
Manuel Bouyer writes:
> On Sat, Sep 21, 2002 at 05:52:30PM -0400, Paul Mather wrote:
> > On Sat, Sep 21, 2002 at 11:12:23PM +0200, Manuel Bouyer wrote:
> > 
> > => Can you post the output of raidctl -s raid0 ?
> > 
> > Here it is:
> > [...]
> 
> Well, this is strange. I don't think the component should have been marked
> failed in this case, the error should just have been reported to userland.
> This is likely a bug in raidframe.

Yes.  RAIDframe currently does very poorly when there are more failures than 
it knows how to deal with.  (RAID 0 with a 1-component failure just 
"keeps on working".  RAID 5 with 2 components failed causes a panic..)

> Even though it did mark the component failed, it's likely that it is
> still using it, or the raid0 would be unusable.

Right.
 
> I suggest to run a raidctl -R on the failed component, to mark it as available
> again. Or you may have trouble after a reboot.

RAIDframe won't let you do a reconstruct on a RAID 0.  What you *might* be able
to get away with is: 

  raidctl -u raid0
  raidctl -C /etc/raid0.conf raid0
  raidctl -I 12345 raid0

i.e. unconfigure, then tell it to "really configure" (overriding the check for 
failed disks), and then get it to update all the component labels and such.

RAIDframe should have returned EIO (or something)... making it do that is on
my list, but it's not proving to be easy...

Later...

Greg Oster