tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RAIDframe component replacement



[Please don't use paragraph-length lines for normal text!]

> I have a RAID1 consisting of sd0a and sd1a.  Now, sd0 sometimes fails
> with "hardware error", but reconstruction onto it is OK.  Of course,
> I want to replace the disc.  Luckily, I have a spare drive and
> everything is hotpluggable SCA and I have unused slots.

> It seem I have two options (given the spare disc I have has already
> been fdisk'ed and disklabel'ed):

> 1. Leave the two current discs in, insert the replacement disc,
> scscictl scan it (becoming sd2) and then add it as a hot spare via
> raidctl -a sd2a, Then, raidctl -F sd0a which should begin a
> reconstruction on sd2a.

> 2. Do a raidctl -f sd0a (if sd0 hasn't been marked as failed
> already), then scsictl detach it and pull it out.  Then, substitute
> it with the replacement disc, scsictl scan (does it become sd0 then?)
> and raidctl -a sd0a.  Probably I have to raidctl -F component0 again
> in order for the reconstruction to begin.

Actually, as I think someone else pointed out, there's a third option:

3. raidctl -f /dev/sd0a raid0 (if it isn't already failed), pull the
drive, put in the replacement, and raidctl -R /dev/sd0a raid0.  In my
experience, this should work.  The hazards in hot-replacing a SCSI disk
are electrical, which is not an issue if your hardware is designed for
hot-plug, and data, which is not an issue provided sd0 is completely
closed before removal and not opened until the replacement is ready.
Provided it's not used for anything but that RAIDframe member, this
should be the case.

> Additionally, I would prefer the procedure that is safer against the
> remaining component (sd1) failing in the middle of it.

None of these can help with that.  Whenever as your RAID1 is running
single-member, you lose it if the live member fails.

I've done RAID 11 (to coin a phrase), RAID1 atop RAID1.  In my case it
was three-member, not four-member, and this caused some trouble with
autconfiguration; I've been thinking about possible ways to deal with
that, but haven't implemented anything yet.  It did mean, though, that
we could survive two failures without data damage, not just one.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index