netbsd-help: Re: ccd(4) interleave factor

Subject: Re: ccd(4) interleave factor
To: Chris Jones <chris@cjones.org>
From: Jason R Thorpe <thorpej@zembu.com>
List: netbsd-help
Date: 05/12/2001 23:05:18
On Sat, May 12, 2001 at 10:45:12PM -0600, Chris Jones wrote:

 > If you have identical drives, you'd probably be *much* better off
 > using raid(4) instead of ccd(4).  I believe the author of ccd,
 > Jason Thorpe, has said, basically, that it was a cool idea, and
 > nifty technology, but that RAID is really what's designed for the job.
 > (Apologies if I'm misquoting here.)

Mike Hibler at U. of Utah was the original author of the ccd(4) driver (it
was originally called "cd" in the Utah 4.3BSD release for the HP 9000/300).
I did, however, hack on ccd(4) quite a bit :-)

Yah, the raid(4) driver (RAIDframe, originally from the Parallel Data
Lab at CMU) is probably a better choice for your disk striping needs.
It supports RAID 0 (striping), RAID 1 (mirroring), RAID 4 (striping with
reconstruction info stored on a dedicated drive), and RAID 5 (striping
with reconstruction info distributed across all drives).

The ccd(4) (concatenated disk) driver originally was, as the name implies,
a way to concatenate small (in capacity, not in physical size) HP-IB disks
into a larger virtual disk, in order to make them, well, useful :-)

Originally, it filled up an entire component before moving on to the next
one.  That is to say, the interleave factor was "size of component".  Mike
realized that he could generalize some stuff and basically get disk striping
for free, so he did so.  So, while it was useful (what an understatement;
several years ago, I set up an FTP server on an HP 9000/380, the storage
space coming from 4 HP 7937H [~600MB] disks striped togeter with ccd(4)),
it was never really intended to be a general RAID solution.

 > As far as optimum interleaves go, that's the subject of plenty of
 > research.  The bottom line is to try different numbers and see what
 > works best for you.  There are some guidelines, though.  I'll see how
 > many of them I can get right:
 > 
 > > The ccd(4) manpage states: "the optimum interleave factor is
 > > typically the size of a track."  This would be fine, but I'm not
 > > really sure how to determine the size of a track.
 > 
 > The rationale here is probably that a disk can (hopefully) read and
 > transfer a track all at once.  This may be more true for SCSI disks
 > than IDE, but I'm not really sure.  To find the track size, check
 > your disklabel output, or your boot messages (check dmesg(8)).  Keep
 > in mind, however, that nearly all disks you'll encounter nowadays
 > have varying track sizes.  The cylinders near the center of the disk
 > have shorter tracks, because they want to keep the data density roughly
 > the same on all parts of the disk.  But manufacturers make guesses at
 > an optimum number for their disks to report as the "track" size, and
 > it probably won't hurt to take their advice.

Actually, the ccd(4) manpage should probably be revised.  With most disks
these days having an ill-defined geometry (it's actually variable depending
on which part of the disk is involved in the transfer, and some disks
actually *completely* fake it, having huge internal block sizes and emulating
smaller block sizes for the benefit of operating system software), setting
up your interleave based on the geometry is not the best idea, I think.

What you'd like to do is arrange it so that you can do a reasonably large
transfer to as many components as you can for each transfer to the RAID
volume.  So, let's assume the file system is writing 64k chunks to the
RAID, you might want to break that up into 16k or 32k chunks for each
component ... it used to be the case that parallelism was more important
that transfer size (so, you would e.g. interleave across multiple controllers
as well as multiple disks, controller first).  But you really want to
do larger transfers on the disks of today.

-- 
        -- Jason R. Thorpe <thorpej@zembu.com>