Subject: Re: RAID0 (I know, I know) reconstruction on another drive pair.
To: None <netbsd-help@netbsd.org>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-help
Date: 08/25/2005 15:54:05
Marc Tooley writes:
> 
> I have a RAID0 partition that has had one of the drives just go partly 
> belly-up in it, and am trying to salvage what I can from the setup. 
> Basically, drive #1 has:
> 
[]
> 
> Drive #2 has:
[]
> 
> I have two other drives that aren't identical, but close (2 x 40GB 
> instead of 2 x 30GB) and I did the following to copy over the bad 
> volumes to the good:
> 
> dd conv=noerror if=/dev/rwd0d ibs=64k | progress -l 30g dd of=/dev/rwd2d 
> obs=64k
> dd conv=noerror if=/dev/rwd1d ibs=64k | progress -l 30g dd of=/dev/rwd3d 
> obs=64k
> 
> ... and this seemed to work. Basically, I was able to run:
> 
> disklabel wd2
> disklabel wd3
> 
> ... and both commands returned useful information after a reboot. I was 
> also able to duplicate the raid.conf file for the first set, for the 
> new set, and run:
> 
> raidctl -c raid1.conf raid1
> 
> ... and then I got an identical disklabel to the original raid0 set 
> from:
> 
> disklabel raid1
> 
> ... my excitement and relief was premature, unfortunately.
> 
> I have two problems now:
> 
> . Mounting the new RAID0 set gives me *all kinds* of problems. Almost 
> every file has a bad type on it: files are special device files, 
> directories are files, strange modes are all over the place.. etc etc.

This sounds like your filesystem has gone south.

> . The second volume set isn't bootable, while the first one is.

My guess is it's a BIOS thing of some sort...  That, or /boot somehow 
got corrupted (or it now can't find /boot)

> . Doing anything useful is difficult because my kernel:
> 
> NetBSD warp 3.99.3 NetBSD 3.99.3
> 
> ... panics at the first sign of corruption. It's been slow going. :(
> 
> My questions:
> 
> 1. Do you have any suggestions for me to rebuild the RAID set on the new 
> volume with the greatest chance of success? Hopefully Mr. Oster might 
> be kind enough to reply to the list. :-)

You might want to try "conv=noerror,sync" with the 'dd'?   I might be
kind enough to reply to the list, but I"m not sure I'm going to be of 
much help :-}

I'd also attempt to do a 'dump' from the original RAID set... (you do 
have backups too, right? :) )

It's sounding like you're missing some metadata bits from the 
original filesystem, and that's going to cause all sorts of grief, 
no matter how you cut it...

> 2. Why didn't dd copy over the bootblocks and make the "clean" set 
> bootable? When I pull the bad drives, the machine insists there's "No 
> operating system." wd0d and wd2d/wd3d all have d partitions that 
> encompass the whole 30GB portion from 0 onwards.

BIOSisms is my guess... that, or /boot (or a pointer to it) is munged.

When a RAID Level 0 set has a component failure there's not much that 
can be done to recover the missing bits... and if some of those 
missing bits are metadata, then backups become the prime recovery 
method...

Later...

Greg Oster