Subject: Re: Recommendations...?
To: None <port-alpha@netbsd.org>
From: Paul Mather <paul@gromit.dlib.vt.edu>
List: port-alpha
Date: 03/23/2000 11:08:43
On Wed, 22 Mar 2000, Mason Loring Bliss wrote:

=> 3) Looking at the GENERIC kernel shows only one option for hardware RAID.
=> Can anyone attest to how well it works? Can someone point me to a good
=> description of RAID levels? I'm inclined to do RAID 1, but if I can get
=> to where I can understand exactly how, say, RAID 5 reconstructs lost
=> data with parity from working disks, I'd be more comfortable considering
=> it.

I just set up a RAID 5 myself, yesterday, across three DEC RZ25 drives,
so I don't have any long-term personal reliability data so far.  But, as
of now, it's worked fine, and was straighforward to set up.

A source of information on the RAID levels (aside from "man raid":) can
be found in the RAIDframe documentation itself (from which the NetBSD
software is derived).  Also, there's an ACM Computing Surveys on
RAID which is highly informative.  The references are:

William V. Courtright II, Garth Gibson, Mark Holland, LeAnn Neal Reilly,
and Jim Zelenka, "RAIDframe: A Rapid Prototyping Tool for RAID
Systems," Technical Report, Parallel Data Laboratory, School of Computer
Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA,
http://www.pdl.cs.cmu.edu/RAIDframe/RAIDframeBook.ps

(RAIDframe home page is at http://www.pdl.cs.cmu.edu/RAIDframe/)

P. Chen, E. Lee, G. Gibson, R. Katz and D. Patterson,
"RAID: High-Performance, Reliable Secondary Storage," ACM Computing
Surveys, vol. 26, no. 2, June 1994, pp. 145-188


Besides these two, there's a huge literature on RAID. :-)

The tradeoff between RAID levels is the amount of redundancy you want to
suffer, and the amount of parallelism you want to extract from the disc 
array.

RAID 1 is fairly simplistic in concept, but highly redundant (50% space
redundancy!).  Also, you are limited to two-way mirroring under NetBSD,
which, avoiding any hierarchical RAID stacking or ccd'ing, means you'd
be limited to RAID across two drives at most.  If you are going to be
loading up your server with many (> 2) drives, you wouldn't be able to
RAID 1 across them all.

RAID 4 and 5 ameliorate the loss of space due to redundancy by computing
parity across "stripe groups" (RAID 1 simply duplicates data
entirely).  So, for RAID 4 and 5, data are arranged in stripe groups
across the drives making up the RAID, with one of the drives being used
to store the parity for the group.  In RAID 4, a dedicated drive is used
to store all the parity data; in RAID 5, the parity is interleaved
throughout all the drives in the RAID, avoiding a single drive becoming
a "parity bottleneck."  So, under RAID 4 and 5, a RAID of N drives uses
the equivalent of N-1 drives for data, and 1 drive for parity, resulting
in a much lower space lossage due to redundancy.  The parity scheme for
RAID 4 and 5 is a simple XOR scheme, enabling any single failed disc to
be reconstructed from the remaining discs in the RAID.

Another advantage of larger number of drives in the RAID is the
increased throughput by parallelising I/O to a greater number of
discs.  This helps amortise the cost of seeks, which is a dominating
factor in disc I/O.

Cheers,

Paul.

e-mail: paul@gromit.dlib.vt.edu

"Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid."
        --- Frank Vincent Zappa