current-users: Re: Why my life is sucking. Part 2.

Subject: Re: Why my life is sucking. Part 2.
To: Greywolf <greywolf@starwolf.com>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: current-users
Date: 01/17/2001 21:08:00

On Wed, Jan 17, 2001 at 10:18:57AM -0800, Greywolf wrote:
> [...]
> 
> As far as RAID goes, I think some serious stress-testing needs
> to happen (thanks, Herb, for starting on it; perhaps an individual
> with non-critical data could step forward and take over? :-).

I *did* stress-testing. I have an alpha with 20 9G scsi disks, spread over
4 ncr875, used as raid-1 (and gigabit ethernet). 
efore putting in production I've stressed it very hard, powering down a disk
enclosure while writing, powering it back up; rebooting with the disk
enclosure down, or up, etc ...
I've been able to detect a few problems in raidframe that Greg fixed,
exept 2 (can't rebuild a disk while writing when more than one disk is
failed, or you get FS corruption. This only applies to raid1 with more than
2 disk, and more than one failed. The second is what I already talked about,
machine hang for a while doing a lot of I/O when starting a disk reconstruct.
Can't reproduce on any machine but my alpha).
Since then (~june) it's in production as NFS/smb server for 60-80 Solaris,
Linux and windows boxes. Not a single problem (exept a panic in ffs fixed by
charles, but hey, it was 1.4Z at this time :), and the raid-1 worked properly
when a disk enclosure failed.

Since then I installed another NFS server (PIII/700) with raid-1, upgraded
my ftp/www/mail server (also official NetBSD mirror) to use raid-1 for root,
ugraded 2 routers to have raid-1 for its boot disk. I also use raid-1
in my IDE testbed (don't know how I could use that much disk otherwise :)
Not a single problem. But I changed the geometry in the disklabel to
more sane value (as already mentionned).

So IMHO raidframe (or at last raid-1) has been pretty well tested, and is
ready for production use.

--
Manuel Bouyer <bouyer@antioche.eu.org>
--