Subject: Re: Finding lost components in a raid set?
To: Johan Ihren <firstname.lastname@example.org>
From: Greg Oster <email@example.com>
Date: 02/07/2002 16:37:16
Johan Ihren writes:
> Greg Oster <firstname.lastname@example.org> writes:
> > Johan Ihren writes:
> > > Greg Oster <email@example.com> writes:
> > >
> > > Hi Greg,
> > >
> > > > > Basically it seems that everything has worked very nicely and I'm
> > > > > really pleased.
> > >
> > > Unfortunately I spoke too soon.
> > >
> > > My re-construction of the "failed" component with a "raidctl -R
> > > /dev/wd2g raid1" just finished and raidctl -s now reports happiness.
> > >
> > > But the disklabel for raid1 is zeroed.
> > :( Hmmm...
> > > How can that happen? I had two components out of three intact at all
> > > times for a 2+1 RAID5 device and I see no reason to lose the label.
> > The label should have been quite recoverable... in fact, it should have bee
> > there even with just 2 components....
> > > I have to admit that I did *not* keep a copy of that label in a safe
> > > place, which in retrospect seems rather stupid.
> > Have a look in /var/backups :)
> Whoever put disklabels under RCS control in /var/backups deserves
> eternal gratitude. Brilliant! Wonderful!
You'll find another thread (Titled: "*whew*" on -current users, which is
exactly about this :) )
> However that did not save the day in my particular case. I did indeed
> find the label and I know it was the right label, since I used
> nonstandard parameters to newfs:
> 8 partitions:
> # size offset fstype [fsize bsize cpg/sgs]
> d: 199999872 0 4.2BSD 0 0 0 # (Cyl. 0 - 260416
> e: 199999872 0 4.2BSD 4096 32768 256 # (Cyl. 0 - 260416
> However, life as we know it was no longer to be found on the raid1
> planet. This was once obviously a seriously damaged animal. And now it
> is dead:
> bash# fsck /dev/rraid1e
> ** /dev/rraid1e
> ** File system is clean; not checking
> bash# mount /dev/raid1e /usr/raid/raid1
> bash# df | grep raid1e
> /dev/raid1e 99543824 4 94566628 0% /usr/raid/raid1
> bash# ls /usr/raid/raid1
> ls: /usr/raid/raid1: Bad file descriptor
> bash# file /usr/raid/raid1
> /usr/raid/raid1: can't stat `/usr/raid/raid1' (Bad file descriptor).
Yuck. I suspect doing 'fsck -f /dev/rraid1e' would yield a whole bunch of
> Time to start over. I've had better evenings.
> PS. During my permutations of disks and IDE controllers when trying to
> isolate the hardware problem the other raid components were likely at
> some time located on the bad Promise controller. I wasn't mindful of
> that since I didn't know that the controller was bad and because of
> autoconfig it didn't matter that disks were renumbered. And my guess
> is that no disk was safe when attached to that controller.
Ya... all they need to do is scribble stuff over the 'good' disks, and it
doesn't matter how much RAID you have... :( Once 2 disks in the RAID 5 set
got scribbled on (even randomly), it's game over :(