Subject: Re: Finding lost components in a raid set?
To: Greg Oster <email@example.com>
From: Johan Ihren <firstname.lastname@example.org>
Date: 02/07/2002 23:27:34
Greg Oster <email@example.com> writes:
> Johan Ihren writes:
> > Greg Oster <firstname.lastname@example.org> writes:
> > Hi Greg,
> > > > Basically it seems that everything has worked very nicely and I'm
> > > > really pleased.
> > Unfortunately I spoke too soon.
> > My re-construction of the "failed" component with a "raidctl -R
> > /dev/wd2g raid1" just finished and raidctl -s now reports happiness.
> > But the disklabel for raid1 is zeroed.
> :( Hmmm...
> > How can that happen? I had two components out of three intact at all
> > times for a 2+1 RAID5 device and I see no reason to lose the label.
> The label should have been quite recoverable... in fact, it should have been
> there even with just 2 components....
> > I have to admit that I did *not* keep a copy of that label in a safe
> > place, which in retrospect seems rather stupid.
> Have a look in /var/backups :)
Whoever put disklabels under RCS control in /var/backups deserves
eternal gratitude. Brilliant! Wonderful!
However that did not save the day in my particular case. I did indeed
find the label and I know it was the right label, since I used
nonstandard parameters to newfs:
# size offset fstype [fsize bsize cpg/sgs]
d: 199999872 0 4.2BSD 0 0 0 # (Cyl. 0 - 260416*)
e: 199999872 0 4.2BSD 4096 32768 256 # (Cyl. 0 - 260416*)
However, life as we know it was no longer to be found on the raid1
planet. This was once obviously a seriously damaged animal. And now it
bash# fsck /dev/rraid1e
** File system is clean; not checking
bash# mount /dev/raid1e /usr/raid/raid1
bash# df | grep raid1e
/dev/raid1e 99543824 4 94566628 0% /usr/raid/raid1
bash# ls /usr/raid/raid1
ls: /usr/raid/raid1: Bad file descriptor
bash# file /usr/raid/raid1
/usr/raid/raid1: can't stat `/usr/raid/raid1' (Bad file descriptor).
Time to start over. I've had better evenings.
PS. During my permutations of disks and IDE controllers when trying to
isolate the hardware problem the other raid components were likely at
some time located on the bad Promise controller. I wasn't mindful of
that since I didn't know that the controller was bad and because of
autoconfig it didn't matter that disks were renumbered. And my guess
is that no disk was safe when attached to that controller.