netbsd-users: Re: raidframe problems (revisited)

Subject: Re: raidframe problems (revisited)
To: Greg Oster <oster@cs.usask.ca>
From: Louis Guillaume <lguillaume@berklee.edu>
List: netbsd-users
Date: 05/29/2007 08:24:50

Greg Oster wrote:

> With the array in degraded mode, can you mount /dev/wd1a (or 
> equivalent) as a filesystem, and run a series of stress-tests on 
> that, at the same time that you stress the RAID set?  Something like:
> 
>   foreach i (`jot 1000`)
>   cp src.tar.gz src.tar.gz.$i && rm -f src.tar.gz.$i & 
>   sleep 10
>   dd if=/dev/zero of=bigfile.$i bs=10m count=100 && rm -f bigfile.$i &
>   sleep 10
>   dd if=src.tar.gz.$i of=/dev/null bs=10m &
>   end
> 
> that end up running on both wd0a and wd1a at the same time.  In an 
> ideal world, take RAIDframe out of the equation entirely, and push 
> the disks, both reads and writes... (If you have an area reserved for 
> swap on both, you could disable swap, and use that space).  And then 
> once the disks are "busy", do something like extract src.tar.gz to 
> both wd0a and wd1a, and compare the bits as extracted and see if 
> there are differences.  (You'll need to tune things so you don't run 
> out of space, of course)

This is a great idea and I'll add it to my list of tests to try and
reproduce the problem.

> I suspect it's a drive controller issue (or driver issue) that only 
> manifests itself when you push both channels really hard... 
> 
Judging from your experience and what others have said about the
stability of raidframe I highly suspect the controller (or driver) too.
Especially since the RAID-1 set works fine with only one component! It's
not like the system doesn't have the right data in the buffers to write
out to disk. I don't believe the memory is the problem because it's been
replaced.

What hasn't been tested (by me) is maxing out the i/o on both channels
at the same time. So I will do this next...

Thanks!

Louis