current-users: Re: Finding lost components in a raid set?

Subject: Re: Finding lost components in a raid set?
To: Greg Oster <oster@cs.usask.ca>
From: Johan Ihren <johani@autonomica.se>
List: current-users
Date: 02/07/2002 23:27:34
Greg Oster <oster@cs.usask.ca> writes:

> Johan Ihren writes:
> > Greg Oster <oster@cs.usask.ca> writes:
> > 
> > Hi Greg,
> > 
> > > > Basically it seems that everything has worked very nicely and I'm
> > > > really pleased.
> > 
> > Unfortunately I spoke too soon. 
> > 
> > My re-construction of the "failed" component with a "raidctl -R
> > /dev/wd2g raid1" just finished and raidctl -s now reports happiness.
> > 
> > But the disklabel for raid1 is zeroed.
> 
> :( Hmmm...
> 
> > How can that happen? I had two components out of three intact at all
> > times for a 2+1 RAID5 device and I see no reason to lose the label.
> 
> The label should have been quite recoverable... in fact, it should have been 
> there even with just 2 components....
> 
> > I have to admit that I did *not* keep a copy of that label in a safe
> > place, which in retrospect seems rather stupid.
> 
> Have a look in /var/backups :)

Whoever put disklabels under RCS control in /var/backups deserves
eternal gratitude. Brilliant! Wonderful!

However that did not save the day in my particular case. I did indeed
find the label and I know it was the right label, since I used
nonstandard parameters to newfs:

8 partitions:
#        size    offset     fstype  [fsize bsize cpg/sgs]
 d: 199999872         0     4.2BSD      0     0     0   # (Cyl.    0 - 260416*)
 e: 199999872         0     4.2BSD   4096 32768   256   # (Cyl.    0 - 260416*)

However, life as we know it was no longer to be found on the raid1
planet. This was once obviously a seriously damaged animal. And now it
is dead:

bash# fsck /dev/rraid1e
** /dev/rraid1e
** File system is clean; not checking
bash# mount /dev/raid1e /usr/raid/raid1 
bash# df | grep raid1e
/dev/raid1e  99543824        4  94566628     0%    /usr/raid/raid1
bash# ls /usr/raid/raid1 
ls: /usr/raid/raid1: Bad file descriptor
bash# file /usr/raid/raid1 
/usr/raid/raid1: can't stat `/usr/raid/raid1' (Bad file descriptor).

Time to start over. I've had better evenings.

Johan

PS. During my permutations of disks and IDE controllers when trying to
isolate the hardware problem the other raid components were likely at
some time located on the bad Promise controller. I wasn't mindful of
that since I didn't know that the controller was bad and because of
autoconfig it didn't matter that disks were renumbered. And my guess
is that no disk was safe when attached to that controller.