Subject: Re: FFS tuning question
To: Joel Votaw <jovotaw@cs.nmsu.edu>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-users
Date: 10/05/2000 20:55:43
Joel Votaw writes:
> 
> Background:  I'm using RAIDframe's RAID 5 implementation across four 60GB
> IDE drives for bulk storage for a home media server.  The four drives are
> on just two IDE controllers, so two of the drives are IDE masters and two
> are IDE slaves.
> 	Obviously, the combination of calculating and writing parity,
> 5400rpm IDE drives, and hitting both master and slave on the same
> controller does not make for great performance.  I've tried to improve
> performance a little by setting the RAID "chunk" size to 64k (128
> sectors/stripe-unit) in the belief that that means that all reads/writes
> will be 64k in size, which is optimal for IDE controllers.  However, that
> belief is open to debate since http://www.zomby.net/work/ indicates that
> sequential writes are 10 times slower when you get above 16
> sectors/stripe-unit.

You're running into a couple of things here:
 1) with 3 disks for data (and one for parity) for each stripe, it's 
impossible to pick an even "chunk" size such that you arn't always doing
at least one "small write" (i.e. read old data, read old parity, write new 
data, write new parity) per IO.  Using 3 drives or 5 drives allows things to 
be split up more evenly, and should yield better performance (at least for 
writes).

 2) With a 64k chunk size, you will always be doing the small write thing
due to limitations on how much data gets handed to the RAID device at a
given time.  Since it will have to do 2 reads and 2 writes for each single 
'chunk' write that you do write performance will be only about 1/4 of read 
performance.

> I'm formatting this as a single filesystem.  I'm using FFS since LFS
> doesn't sound quite stable enough yet, especially on large drives, for my
> comfort.  I'm creating my file system using, I believe,
> 
> 	newfs -c 700 -m 5 -i 65536 -b 32768 -f 4096 -r 5400 /dev/raid0c
> 
> So it has 32k blocks and 4k frags, in addition to my other attempts to
> make it a nice FS on a huge partition.
> 
> 
> My questions are: 
> 
> Would there be a benefit in making the FFS block size equal to the
> stripe-unit size (64k)?

'tuning' seems to be quite related to the physical disks.  E.g. on my home
box w/ 5 2GB FW SCSI drives, a width of 32 seems to be about right. 
(This also turned out to be the best w/ 50GB U2W SCSI drives, 5 per RAID 5 
set).  On a box at work w/ 5 60GB IDE drives, a width of 16 (8k) is the best.
(I basically did a bunch of testing w/ different sizes, just to find the 
"sweet spot"...) 

> Can frag sizes reasonably be a size other than 1/8th the size of a block?

Sure.  1/4 is nice too :)

> Can several small files be in frags in the same block, or are there some
> weird semantics with frags to improve performance?  (I dunno, like "you
> can only have the tail of ONE file in a block; the remaining space must be
> separate files"; I'm just making this up.)
> 
> 
> Of course I can test out some different settings, but rebuilding RAID
> parity takes about 6 hours so I want to avoid that as much as possible.

If you're just testing, don't bother doing the 'raidctl -i' until you've 
selected the parameters you want (the RAID set will run fine, with the 
exception that you won't have any real protection of the data).  
Also: If you do something like:

  dd if=/dev/zero of=/dev/raw_raid_component bs=1m

for each of the RAID components *before* you do the 'raidctl -I' and 
'raidctl -i', the parity re-write will go much faster.  (The above 
zeroing ensures that the parity is correct before the re-write occurs, which 
means that it's able to complete the check without having to adjust the 
parity at all.  Of course, this trick only works if you have no data on the 
RAID set, and are preparing it for a fresh filesystem.)

Later...

Greg Oster