current-users: Re: Question about raidframe and reconstruction

Subject: Re: Question about raidframe and reconstruction
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 06/28/2001 09:57:53
Brian Buhrow writes:
> 	Hello folks.  I've been successfully running raidframe on a machine
> with a 1tb raid for about 6 months. Two days ago, one of the drives failed
> on the raid, and I was able to reconstruct the raid onto the spare.  Once
> the reconstruction was complete. raidctl showed the spare as being in use. 
> Before we could put on the replacement for the failed drive in the array,
> but after reconstruction said it was complete, the root disk died on the
> machine.  The root disk is not part of the array. 

One good reason to put root on RAID. :-/  (and swap too...)

> Now, with a new root
> disk, but with the same config raid0.conf file, when I run raidctl -s, it
> shows that the spare disk is just a spare, and the failed drive is still
> failed. 

Right.  If you're using "normal config", it believes what you told it in the 
file...  The man-page does talk about this a little, but it perhaps not clear 
enough -- the raid0.conf would need to be changed to reflect that the spare is 
now an actual component.  (The "normal config" stuff is not smart enough to 
figure out that the spare is actually being used! :( )  Auto-config *is* smart
enough to figure this out.... all would have been well with auto-config.

> Worse, it says that the parity is dirty, a condition I expected,

It says the parity is dirty because of the hard crash, but hopefully 
it's mostly consistent... You still have n-1 disks.  That's enough to 
create the stuff that's on the n'th disk.  (There might be some corruption, 
but only the "normal" amount associated with a hard crash.  )

> given the hard crash it took when the root disk died in the night.
> 	What confuses me, however, is the apparent lack of state as to the 
> use of the spare disk. 

See above.  Autoconfig is what you wanted :(

> Since I got fully reconstructed to the spare,
> shouldn't I be able to recalculate parity using the spare?

If you've configured the array with 'raidctl -c' and then mounted it, you 
*can't* use the spare as a live component (you can reconstruct to it again 
though).  If you havn't mounted it (i.e. havn't touched any data on it), then 
you *can* use the spare as a live component (but in this case you'd have to 
use 'raidctl -C', as the component labels would be incorrect :(  Going this
way is a little trick though, and may not be worth doing)

>  And why does
> the spare show as not being in use now that I've rebooted?

Suckage on the part of 'normal config' (which is why I'm working on
having the 'normal config' use the 'autoconfig' stuff.)

> 	Is there a way out of this without losing the data on the raid array?

Yup.  You havn't actually lost any data yet.   It's just running in 
degraded mode when you want it to be in optimal mode.  Just do the 
reconstruct again.  

> I really don't understand the value of the spare if you lose it when you
> reboot.  does the spare only work across reboots if you have autoconfigure
> on?

Yes.  The spare also works if you move it from the "START spare" section of 
raid0.conf to the "START disks" section.

When I can get rid of the old configuration goop, the new configuration 
code will use the autoconfig stuff, and will not suffer from the problem you 
just encountered :(

Later...

Greg Oster