Subject: Re: Finding lost components in a raid set?
To: Johan Ihren <email@example.com>
From: Greg Oster <firstname.lastname@example.org>
Date: 02/07/2002 15:41:56
Johan Ihren writes:
> Greg Oster <email@example.com> writes:
> Hi Greg,
> > > Basically it seems that everything has worked very nicely and I'm
> > > really pleased.
> Unfortunately I spoke too soon.
> My re-construction of the "failed" component with a "raidctl -R
> /dev/wd2g raid1" just finished and raidctl -s now reports happiness.
> But the disklabel for raid1 is zeroed.
> How can that happen? I had two components out of three intact at all
> times for a 2+1 RAID5 device and I see no reason to lose the label.
The label should have been quite recoverable... in fact, it should have been
there even with just 2 components....
> I have to admit that I did *not* keep a copy of that label in a safe
> place, which in retrospect seems rather stupid.
Have a look in /var/backups :)
> But I regarded the
> underlying device as "safe", in the sense that an event that manage to
> wipe out the label would wipe out the file system data also.
> I also have to admit that I am less happy than 30 minutes ago ;-(
Ya.. no kidding :(
> Losing the disklabel after what should be considered a standard
> replacement of a failed component is not encouraging. But I really
> don't see where I did anything wrong.
You didn't... at least that I can tell.... (the only way you should have
potentially lost anything here is with 'raidctl -C', and getting the order of
the components wrong...)
> > > But, since I have no indication whatsoever that the disks should have
> > > failed in any way, I cannot help being a bit curious about *both* raid
> > > sets losing a component on the *same* disk without the disk being bad.
> > I don't suppose you have a copy of the disklabel for that disk, do
> > you? The autoconfig should have picked it up wd2a if it had a
> > disklabel type of FS_RAID (unless the component label got
> > corrupted). 'raidctl -c' should have found a valid component label
> > for wd2g -- the fact that it's marked as failed indicates that it
> > likely didn't.
> I have the label for wd2 (attached), I don't have the label for raid1,
> as I said above.
> > > I assume that the autoconfigured raid devices keep their configs in
> > > the component labels
> > The configuration info is in the component labels, yes.
> > > or thereabouts and I don't understand how both of
> > > them can have been trashed at the same time.
> > I don't get that either... Was wd2 on the controller card that
> > failed? Maybe it managed to zero the component labels on you??
> > Seems unlikely, but that's the only explanation I can think of so
> > far...
> Yes, wd2 was on the controller that failed.
> > > And if the on-disk
> > > configs were "bad", why didn't a "raidctl -c" fix that, if the media
> > > is ok?
> > "raidctl -c" uses the component labels to verify that you have the
> > components listed in the right order... If a component label is
> > missing (or badly out-of-sync with the remaining components), then
> > that component will be marked as failed. If you managed to get
> > components in the wrong order, the RAID set shouldn't even
> > configure. (It tries quite hard to make sure you don't mess up the
> > ordering, but it needs valid component labels in order to do
> > that. :) )
> I assume that this "component label" is stored adjacent to the device
> it describes. I.e. the component label for /dev/wd2g is located at the
> head of that physical partition?
Yes.. the first 32K is 'reserved'. The component label lives 16K in from
the start of the partition.
> Here's the disklabel for wd2. It is definitely intact, since it is
> exactly the same as for wd0 and wd1:
Try re-labelling raid1 with the label from /var/backups. Hopefully, however,
whatever's been zeroing stuff on you won't have wrecked the filesystem too :(