Subject: Finding lost components in a raid set?
To: None <current-users@netbsd.org>
From: Johan Ihren <johani@autonomica.se>
List: current-users
Date: 02/07/2002 21:20:26
I've had lots of fun playing with RAIDframe lately and quite soon I
switched to autoconfigured raid devices to "protect" the device from
component renumbering.

Since Murphy is the one that is really in control, a Promise IDE
controller card just failed. It took several attempted reboots, card
rearrangements and cable replacements to make exactly sure that it
indeed was the controller card that was at fault.

The Promise card is now removed, the disks rearranged on the remaining
IDE channels and I want to get my raid devices back (two RAID5
devices, both 2+1 with raid0 being a small one for experiments and
raid1 being 120GB with 100+GB live data).

Unfortunately "raidctl -s" reports one component as completely missing
(for both sets):

Components:
        component0: failed
         /dev/wd0a: optimal
         /dev/wd1a: optimal

This is *not* because the disk failed. The disks are fine, brand new
and no problems. I fixed my small raid0 device according to the manual:

raidctl -a /dev/wd2a raid0      (i.e. add wd2a *again*, since it was lost)
raidctl -F component0 raid0     

For raid1 I tried to be more clever (since it has a significant amount
of data on it), so I switched off autoconfig first, reordered the
disks in /etc/raid1.conf and rebooted in the hope that the "raidctl -c
/etc/raid1.conf raid1" during boot would find the missing component
even though autoconfig didn't. Didn't work. I ended up with:

Components:
        /dev/wd2g: failed
        /dev/wd0g: optimal
        /dev/wd1g: optimal

Here I gave up and initiated a re-construction of my raid1 with a
"raidctl -R /dev/wd2g raid1" that will take about two hours to
complete.

Basically it seems that everything has worked very nicely and I'm
really pleased.

But, since I have no indication whatsoever that the disks should have
failed in any way, I cannot help being a bit curious about *both* raid
sets losing a component on the *same* disk without the disk being bad.

I assume that the autoconfigured raid devices keep their configs in
the component labels or thereabouts and I don't understand how both of
them can have been trashed at the same time. And if the on-disk
configs were "bad", why didn't a "raidctl -c" fix that, if the media
is ok?

Just curious,

Johan Ihren