Subject: Re: raidframe copyback blocks the whole system !?
To: Greg Oster <oster@cs.usask.ca>
From: Markus W Kilbinger <kilbi@rad.rwth-aachen.de>
List: current-users
Date: 06/25/2002 10:53:18
>>>>> "Greg" == Greg Oster <oster@cs.usask.ca> writes:

    Greg> It blocks the filesystem(s) on the RAID set... Unfortunatly,
    Greg> this is a limitation of the copyback code.

Good/important to know!

    >> >> Trying it with 'raidctl -R ...', while the spare is in use,
    >> >> reconstructed the formerly failed disk smoothly, but left
    >> >> the spare disk in 'used_spare' state.
    >> 
    Greg> Hmm!! It really shouldn't be letting you do a 'raidctl -R'
    Greg> after you've already reconstructed to a spare... smells like
    Greg> a bug..
    >> 
    >> A reboot 'solved' this problem, but that's not the clean way for a
    >> raid system, anyway! ;-)

    Greg> Well... if you have hot-swap drives, then you could just
    Greg> stuff the new drive in place of the old one, and not worry
    Greg> about doing a copyback.. (i.e. the new drive you put in
    Greg> becomes the hot spare).

Hmm, how to come!? The old/new drive still has state 'failed', the
spare is 'used_spare'. So, what will happen if another drive fails in
this stage? How will the 'failed' drive become the new spare _without_
reboot?

    Greg> If you don't have hot-swap drives, then you'll have to take
    Greg> things off-line anyway, and at that point you can just
    Greg> shuffle the disks around :)

Yeah, but that's what I want to avoid in my (academic) scenario... ;-)

    -> send-pr?

    Greg> If you'd like. There may already be a PR, but I don't recall
    Greg> for sure.. I do know that this issue is on my RAIDframe
    Greg> "todo" list..

Ok, that's sufficient! ;-)
 
    >> So, copyback is the only (clean) way back from a used spare to
    >> the normal raid disk on a running machine?

    Greg> Technically, yes. However: the used spare should be just as
    Greg> good as a normal raid disk (unless you're using a slower
    Greg> disk or something.) If the used spare is exactly the same as
    Greg> the other drives in the array, I wouldn't even worry about
    Greg> doing the copyback -- at some point when you reboot, the
    Greg> used spare will be pulled into the array as a normal
    Greg> component (assuming you're using the autoconfig stuff). And
    Greg> in the meantime, the spare disk should function just as well
    Greg> as a normal component.

My thought's were about the steps after 'used_spare' without rebooting
the machine (== hot swap drives). Copyback seems to be the only way to
accomplish that, then. I was just looking for a similarly smooth
(blocking free) proceeding with/after the required drive swap, without
rebooting.

So, it's only a very small issue, because the first disk failure
should have warned you about the disk problem, anyway, and plan the
next steps.

Markus.