tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RAIDframe: what if a disc fails during copyback

	Hello.  Note that Raidframe's notion of a hot spare is somewhat
different than other software raid systems in that once you reboot after
copying to a hot spare, that hot spare becomes just another component in
the raid set.  In other words, it loses its hot spare designation and you
should treat it as you would any other component.   That means that raidctl
-r to replace the existing in-place component can be used to replace the
spare with the original disk now that you have it repaired.

	Assuming the original component is still good, a, in Mouse's example,
if 'b' fails during the reconstruction, you're left with a single component
raid1 system again.  If 'A fails during the copy, you're left with some
corrupt data, though the system will not panic and you'll be able to
salvage what you can from the raid.  Unfortunately, I've been caught in
this situation more times than I'd like to say -- there was a crop of bad
Seagate 500GB disks for a while and they had a tendancy to fail in mass at
the same time.


On Oct 29,  1:53pm, Mouse wrote:
} Subject: Re: RAIDframe: what if a disc fails during copyback
} > In a RAIDframe RAID-1, a disc failed and I reconstructed on a spare.
} > Now I want to replace the failed component (actually by the same
} > disc, which needed a firmware update) and want to copyback to it.
} So, let me make sure I understand you correctly.
} So you have drives A, B, and C.  A and B were live.  Let's say B is the
} one that failed.  You reconstructed onto C and have been running with A
} and C.
} Now you have a new B (which in this case is the same hardware with new
} firmware) and want to put it back into service.  I'm not sure whether
} you want to put it into service in place of A or in place of C.  I'm
} going to assume C.
} So, you'd pull C, replace it with B, and initiate a reconstruct, which
} for RAID 1 means copying from A to B.  Right?
} > How will RAIDframe behave if, during the copyback:
} > 1. The replaced component fails
} Is this B?  Or C?  Because it sounds to me as though C would be out of
} service at this point.
} > 2. The spare fails
} Which is "the spare"?  Are you running with a hot spare?  I think a hot
} spare failing means nothing until/unless RAIDframe tries to fall back
} on it.
} > 3. The other, non-replaced component fails?
} That would be A?
} > Specifically: Is there any szenario (other than more than one disc
} > failing) that will put the RAID into a non-redundant state?  I guess
} > 3. may?
} For RAID 1 in general, as soon as you have only one non-failed drive,
} you have no redundancy.  Based on the assumption that RAIDframe RAID 1
} cannot handle more than two drives (always true as far as I know, and
} the 9.0 raidctl(8) manpage says it's still true as of 9.0), this means
} that....
} - If B fails while copying back to it, you are back to non-redundant
}    operation on A.
} - If A fails while copying back, you have no operational set.  Your
}    only real option is to pull A and B, connect C alone, and fall back
}    to the state of things as of when you pulled it; then re-add A or B
}    and copyback from C.
} - If C fails while copying from A to B, nothing in particular happens
}    except that you don't have the hot spare you thought you did.
} /~\ The ASCII				  Mouse
} \ / Ribbon Campaign
}  X  Against HTML
} / \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse

Home | Main Index | Thread Index | Old Index