Subject: Re: replacing a failed disk in a raidframe raid1 mirror
To: None <netbsd-users@NetBSD.org>
From: Aaron J. Grier <agrier@poofygoof.com>
List: netbsd-users
Date: 08/24/2005 10:12:31
On Wed, Aug 24, 2005 at 09:28:22AM +1000, Carl Brewer wrote:
> raidctl -s says :
> /dev/wd1a status is: failed.  Skipping label.
> 
> This is on a box that has a simple RAID1 mirror for
> its entire disk setup (it's a simple LAN server). It's got
> a pair of Maxtor 80GB HDDs.  See :
> 
> mail: {117} df
> Filesystem  512-blocks     Used     Avail Capacity  Mounted on
> /dev/raid0a     508222   256468    226342    53%    /
> /dev/raid0f    4128988  1569328   2353208    40%    /var
> /dev/raid0e    8258300  3003380   4842004    38%    /usr
> /dev/raid0g  140565428 100568668  32968488    75%    /home
> kernfs               2        2         0   100%    /kern

which raid is failed?  I assume you have all of them mapped to the same
two drives?

> It's a NetBSD 2.0.2 server on i386 hw.  dmesg says this :
> mail: {123} grep ^wd /var/run/dmesg.boot
> wd0 at atabus0 drive 0: <Maxtor 6Y080L0>
[...]
> wd1 at atabus1 drive 0: <Maxtor 6Y080L0>
[...]
> My first question is how do I tell which physical disk is which?  When
> I open up the box, is there some way to identify which disk is wd0 and
> which is wd1?

wd1 appears to be connected to the second bus (atabus1) on your
controller card.

if you can run your system with the case off you can do
"dd if=/dev/wd0c of=/dev/null bs=1024k" and see which drive lights up.

> Then, is there some howto for rebuilding the array somewhere?  Do I
> basically replicate the steps in 16.3.4 of the guide? :
> http://www.netbsd.org/guide/en/chap-rf.html#chap-rf-second-disk That
> seems a lot of mucking about, is there an easier or better or less
> complex (so less error-prone)way to do it?  I have backups of the box,
> but all the same, I don't want to trash the filesystem and have to
> restore!

there's a section in the raidctl man page entitled "Dealing with
Component Failures" that covers this.

the first thing I'd do, however, would be to find out why raidframe
marked wd1a as bad.  if it's a cabling problem you may be able to fix
that and then reconstruct onto the existing drive.  atactl(8)'s smart
commands may be helpful too, specifically "atactl wd1 smart status".

-- 
  Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
              "silly brewer, saaz are for pils!"  --  virt