Subject: raidframe re-mirroring (cont'd)
To: None <netbsd-users@netbsd.org>
From: Louis Guillaume <lguillaume@berklee.edu>
List: netbsd-users
Date: 08/13/2004 10:40:09
Hi Everyone,

Sorry for the [accidental] duplicate post on current-users; I really 
meant to post here since this is where the prior discussions on this 
issue were...


I posted a few weeks ago about a problem I had with a raid set, where
one disk was failed and I wanted to bring it back online. Here's what
happened...

. Booted into single-user

. Rebuilt all arrays on the pair of disks: raid0 raid1 raid2 raid3 raid4
- all raid-1. It's set up like this...

#############################
raid0 raid1 raid2 raid3 raid4

wd0a  wd0e  wd0f  wd0g  wd0b
wd1a  wd1e  wd1f  wd1g  wd1b

/     /usr  /var  /home swap
#############################

. fsck-ed all filesystems. reboot

Immediately, I noticed apache2 and spamass-milter fail during startup
(recently built from pkgsrc and very reliable). Immediatiely! This is
what caused me to believe the second disk was bad in the first place.

Now I believed that the disk was actually bad and not the kernel/raidframe.

. Rebooted back to single user.
. Failed all wd1 raid components.
. fsck (finds and fixes errors) and reboot again.

All is well! For a week and a half, not a hitch.

More reason to believe it's the disk.

. Replace suspect disk with another one, disklabeled raidctl -a ...etc.

. Incorporated new spare components into arrays.

. rebooted. raidctl -F ... , fsck , reboot.

SAME FAILURES as before!! Apache2 and spamass-milter are the first to
go. In the past I had not noticed these right away and kept running.

This is very strange. I'd really like to get my redundancy back. But
once again, I'm running on a set of single-component raid-1 arrays.

Here is some other information that may be useful...

Machine - i386
Problem first noticed at NetBSD-2.0E GENERIC.MP kernel
Still a problem at NetBSD-2.0G GENERIC.MP kernel

I'm guessing my disk is good. The machine runs great on one disk. Weeks
of uptime - even months without a peep. So I'm not thinking that there's
a memory problem as someone suggested earlier.

The only other thing I can think of is perhaps the ribbon cable from the
board to the disk. But if that was bad, wouldn't we have much more
obvious issues?

I don't know if this is a config problem, or something else. But there
definitely is a strange problem that's preventing me from mirroring
successfully.

Perhaps too many raid devices on one pair of disks?
Maybe problems with MP kernel and raidframe?

Any help would be great. Please let me know if I can provide more
information.

Thanks,

Louis