Subject: Re: RaidFrame - Failed Partition on one disk
To: Chris Cameron <chris@onemind.com>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-help
Date: 07/23/2003 15:29:08
"Chris Cameron" writes:
> Hi,
>
> I have a Raid1 setup on NetBSD 1.6 which reported to me that I had a failed
> component today.
>
> I dont think that the disk has failed though, as another partition on that
> same disk is still functioning fine. Is there a way to rebuild the bad
> component on the failed raid partition?
You can do that with:
raidctl -R /dev/wd0a raid0
> (I will be verifying that the disk
> is in good condition, but I need to do that outside of office hours).
>
> Below are the results from raidctl -s raid0 and raid2 (the 2 raid partitions
> I have)
>
> server# raidctl -s raid0
> Components:
> /dev/wd0a: optimal
> /dev/wd1a: failed
> No spares.
> Component label for /dev/wd0a:
> Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
> Version: 2, Serial Number: 20021100, Mod Counter: 341
> Clean: No, Status: 0
> sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
> Queue size: 100, blocksize: 512, numBlocks: 1088512
> RAID Level: 1
> Autoconfig: Yes
> Root partition: Yes
> Last configured as: raid0
> /dev/wd1a status is: failed. Skipping label.
> Parity status: DIRTY
> Reconstruction is 100% complete.
> Parity Re-write is 100% complete.
> Copyback is 100% complete.
>
> server# raidctl -s raid2
> Components:
> /dev/wd0e: optimal
> /dev/wd1e: optimal
> No spares.
> Component label for /dev/wd0e:
> Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
> Version: 2, Serial Number: 20021102, Mod Counter: 202
> Clean: No, Status: 0
> sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
> Queue size: 100, blocksize: 512, numBlocks: 74970112
> RAID Level: 1
> Autoconfig: Yes
> Root partition: No
> Last configured as: raid2
> Component label for /dev/wd1e:
> Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
> Version: 2, Serial Number: 20021102, Mod Counter: 202
> Clean: No, Status: 0
> sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
> Queue size: 100, blocksize: 512, numBlocks: 74970112
> RAID Level: 1
> Autoconfig: Yes
> Root partition: No
> Last configured as: raid2
> Parity status: clean
> Reconstruction is 100% complete.
> Parity Re-write is 100% complete.
> Copyback is 100% complete.
>
> Am I correct in thinking that perhaps /dev/wd1a has been corrupted in some
> manner and just needs to be rebuilt, since /dev/wd1e is still in optimal
> state?
My guess is that wd1 has a physical error that is showing up in the
'a' partition of the disk. Partition 'e' doesn't have any errors
(yet), but I wouldn't use that as a good reason for thinking that 'a'
will be fine after a rebuild. Your first step is to check
/var/log/messages and try to find out what actually happened to the
wd1 to cause the error...
> If so, how would I do such a thing?
See above... If, in fact, wd1a got marked as "failed" because of a
read error, you may be able to successfully do a reconstruct to the
same disk if it re-maps around the bad block(s). But you'll want to
make sure you know why the disk failed, and then decide as to whether
you need a replacment.
Later...
Greg Oster