Subject: Re: Raid1 Disk Failure - Diagnosing/Repairing and/or Replacing disk
To: None <yancm@sdf.lonestar.org>
From: Patrick Welche <prlw1@newn.cam.ac.uk>
List: netbsd-users
Date: 09/16/2005 18:52:42
On Fri, Sep 16, 2005 at 12:14:36PM -0500, yancm@sdf.lonestar.org wrote:
> I'm relatively new to Raid. Earlier this year I successfully moved my
> home/office NetBSD 2.0-Stable system onto Raid 1 setup on an 300 MHz
> PC/IDE bus. I confirmed everything worked, etc and has been working great.
> 
> 2 days ago I got a message that one of the disks failed.
> 
> # raidctl -s raid0
> Components:
>            /dev/wd0a: failed
>            /dev/wd2a: optimal
> No spares.
> /dev/wd0a status is: failed.  Skipping label.
> Component label for /dev/wd2a:
>    Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
>    Version: 2, Serial Number: 2147483647, Mod Counter: 222
>    Clean: No, Status: 0
>    sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
>    Queue size: 100, blocksize: 512, numBlocks: 156301312
>    RAID Level: 1
>    Autoconfig: Yes
>    Root partition: Yes
>    Last configured as: raid0
> Parity status: clean
> Reconstruction is 100% complete.
> Parity Re-write is 100% complete.
> Copyback is 100% complete.
> 
> I've searched the mailing lists and the manual and am a bit wary of
> optimal next steps.
> 
> I've ordered a 3rd disk of the same size (80G) and model number just to be
> safe, but...
> 
> Q1: Is there anyway to diagnose the disk live in the system? If so what
> are the steps and commands I'd need?
> 
> Q2: I think I can fumble my through the manual, but it doesn't seem to
> describe what you need to do to replace a failed disk. Again a logical
> sequence of steps and commands would be greatly appreciated

How about the bit in raidctl(1) starting at "Dealing with Component Failures"?


Swap out wd0, then raidctl -s raid0 will say something like

Components:
          component0: failed
           /dev/wd2a: optimal
Spares:
           /dev/wd0a: spare


I don't know whether this is necessary or not, but then I would disklabel
wd0. (Checking in dmesg that wd0 really is the new drive, but it should
be with the above message..) then

raidctl -F component0 raid0
raidctl -s raid0

Cheers,

Patrick