Subject: Re: Raidframe experiments and scsi woes
To: Greg Oster <oster@cs.usask.ca>
From: Michael VanLoon <mvanloon@MindBender.serv.net>
List: current-users
Date: 11/26/1998 13:34:27
RAID-5 doesn't give you high performance.  I think to get high
performance *and* high reliability, you need to forget about parity, and
do normal RAID-0 striping on top of pairs of RAID-1 mirrored drives.
This gives you full ccd-like striped write performance, and double read
performance since it can interleave reads to between the mirrored
drives.


Something like this:

                +--------+
             +--| Disk 1 |
             |  +--------+
  +-mirror-1-+
  |          |  +--------+
  |          +--| Disk 2 |
  |             +--------+
  |
  |             +--------+
  |          +--| Disk 3 |
  |          |  +--------+
  +-mirror-2-+
  |          |  +--------+
  R          +--| Disk 4 |
  A             +--------+
--I
  D             +--------+
  0          +--| Disk 5 |
  |          |  +--------+
  +-mirror-3-+
  |          |  +--------+
  |          +--| Disk 6 |
  |             +--------+
  |
  |             +--------+
  |          +--| Disk 7 |
  |          |  +--------+
  +-mirror-4-+
             |  +--------+
             +--| Disk 8 |
                +--------+

It's kind-of expensive, since you have to have two physical drives for
each logical drive.  But it gives you the highest reliability, with the
fastest fail-over, and the highest performance, all at the same time.
Just my theory anyway, I can't afford enough drives to try this (and I'm
not sure which RAID solutions even support striping on top of multiple
mirrors).

-----Original Message-----
From: Greg Oster <oster@cs.usask.ca>
Date: Wednesday, November 25, 1998 4:04 PM


>Manuel Bouyer writes:
>> I've played a bit with raidframe today. Here are a few results:
>> I used 3 Ultra-wide FUJITSU 8Gb disks on a PII-400Mhz. 2 of the disks
>where o
>> n
>> a ahc2940, the last one on a ncr875.
>> Here are the results I got from bonnie:
>> sd5 is the result for a single drive on the ahc, ccd0 and ccd1 for a
>ccd
>> between the 2 drives of the ahc with an interleave of 32 and 171,
>> respectively; and raid5_0 and raid5_1 for a raid5 array with the 3
>drives,
>> with an interleave of 32 and 171 respectively. raid4_0 and raid4_1
are
>> the same tests for a raid4 array.


[Data omitted]

>> raid5 seems to suffer from bad performances, but I already noticed
>this
>> with "external" raid5 arrays.

>I'm surprised RAID5 is doing so poorly on reads... In your RAID config
>file, what does your "fifo" line look like?  Have you tried playing
with
>that?

>> With raid4 it seems possible to achieve
>> performances close to ccd for reading,

>As long as the set is not running in "degraded" mode, it doesn't read
>the
>parity blocks, and thus can be quite quick.....

>> but writing is worse than
>> raid5 ...

>That's because all of the parity bits are going to the n'th disk, and
>making
>that disk a bottleneck...
>[SCSI bus hangs...]
>I'm hoping that at some point RAIDframe will have some way of better
>communicating with the underlying components.  Of course if the SCSI
>bus hangs, then there's not much RAIDframe can do


>> These behaviors prevent any king of hotswapping, and raidframe would
>be much
>> more useable if these were fixed. The ahc case seems to be the more
>simple
>> to fix. Maybe a look at the FreeBSD driver would help ?
>> Unfortunably I only have temporary access to this test box, so I
>> will not be able to address this in the near future.
>> Now a few comments about the degraded mode of raidframe:
>> After powering off one of the drive, the scsi command timed out and
>> the console got flooded with messages like:
>> Nov 25 13:06:26 pr7 /netbsd: DEAD DISK BOGUSLY DETECTED!!
>> Nov 25 13:06:26 pr7 /netbsd: [0] node (Rop) returned fail, rolling
>backward
>> Nov 25 13:06:26 pr7 /netbsd: [0] node (Rrd) returned fail, rolling
>backward
>> Nov 25 13:06:26 pr7 /netbsd: [0] DAG failure: w addr 0xe2b0 (58032)
>nblk 0x80
>>  (1
>> 28) buf 0xf6268000
>> I guess these comes from raidframe.

>Yup... RAIDframe gets pretty verbose when it can't find the data it
>wants.. :-(

>> Marking the component as failed doesn't
>> seem to help. The 2 other disks have a lot of activity, but the
bonnie
>proces
>> s
>> I had is stuck on "getblk".
>> After a reboot,  raidframe cam up with a component marked "failed"
>(the
>> disk was still off). I powered on the disk and issued a rescan of the
>bus.
>> Then I found no way to ask raidframe to recontruct the data of this
>disk.
>> I had to mark it as spare in my config file, unconfig/reconfig raid0
>> and issue a raidctl -F. I think it would be nice to be able to ask
>raidframe
>> to rebuild a disk directly, for configurations without spares.

>I agree that the procedure for doing this is not very easy..  Direct
>rebuilds
>are on my "todo" list...


>> When configuring raid0 with a non-existent spare, the config fails,
>but
>> the component are still marked busy. After fixing the config file,
>> any raidctl -c would fail because of this. I had to reboot.

>You should have been able to "raidctl -u raid0", and then
>re-configure...
>If something is still marked busy after a config failure, then I've
>still got
>a bug in there somewhere.. (sounds like I do, and I think I know
>where...)

>> Also, I think a 'raidctl -r' should immediatly fail on a device with
>failed
>> components.

>Yup, it probably should... (it can't reconstruct the parity anyways, as
>it's
>missing the data blocks from the dead component)

>> For some reason, the box gets hung for a few second when doing
>> this.
>> However, even with these SCSI issues, raidframe looks really usable.