current-users: Re: raid 0 failure

Subject: Re: raid 0 failure
To: Kevin Sullivan <ksulliva@psc.edu>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 05/10/2000 13:03:30

Kevin Sullivan writes:
> I'm running NetBSD -current (about 1 month old) with a 4 disk raid0
> partition (a news spool).  Last night one of the disks hiccupped.

"Uh oh!" :(

> Now the
> raid0 set will not work.  The disk itself seems fine, but raidframe will
> not let the disk back into the stripe.  It says "/dev/sd16e has a different
> modfication count: 147 88".

Right.  The idea here is that the 'bad' disk is no longer 'in sync' with 
the remaining disks.  For the other RAID levels this is important.

> Is there a way to force raidframe to accept this disk?  

Yes.  Use 'raidctl -C ...' instead of 'raidctl -c ...' to configure the set.  
(You'll need to do a 'raidctl -I ...' to get the component labels back 
in sync again, should you decide to risk everything and continue using
the RAID set without getting that disk replaced.)  I can't predict how 
much stuff would be lost though -- failures on RAID 0 sets are typically
bad news, and things might get fairly messed up... (on the other hand, 
you might be lucky, and if the drive doesn't fail in exactly the same 
place again, you may be able to recover most of the data..)

> The data will be a
> bit inconsistent, but fsck should fix that.  It's a news spool so a few
> missing or corrupted files are okay.

"Good Luck" :)

Later...

Greg Oster