tech-kern: filesystem read-ahead caching....

Subject: filesystem read-ahead caching....
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 05/03/2001 17:53:54
I was reading a bit in "Optimizing UNIX for Performance" today about the
fact that in SunOS (4 & 5) the maxcontig parameter also controls the
number of blocks that are read ahead on each sequential access (at least
when rotdelay is set to zero, and maybe even always).

The SunOS-5.6 mkfs_ufs(1m) manual in fact says:

                                   Note:  This parameter also controls
                                   clustering.  Regard-less of the value
                                   of gap, clustering is enabled only
                                   when maxcontig is greater than 1.
                                   Clustering allows higher I/O rates
                                   for sequential I/O and is described
                                   in tunefs(1M).

The SunOS-5 tunefs(1m) manual goes on to say:

     -a maxcontig   Specify  the  maximum  number  of  contiguous blocks
                    that will be laid out before forcing a rotational
                    delay (see -d below).  The default value is 1, since
                    most device drivers require an interrupt per disk
                    transfer.  Device drivers that can chain several
                    buffers together in a single transfer should set
                    this to the maximum chain length.

It seems from a cursory reading of the NetBSD kernel source that there's
currently no implementation of read-ahead for FFS, but there does seem
to be support for LFS.  Has anyone experimented with adding hooks to
cluster_read() to FFS?

I'm suddenly interested in this because after a short (and perhaps not
well informed) discussion with the vendor of my CMD CRD-5500-based RAID
arrays I was given the following advice:

	Make sure that the chunk size is set to 128 and the read cache
	is set to off.  Otherwise tune via the filesystem.

Now the CRD-5500 does clustered writes and so with enough cache RAM it
should be really efficient even with a chunk-size of 128 (and indeed
Bonnie agrees).  However various tests I've done so far give less than
stunning results for read access through the filesystem, especialy with
the read cache disabled in the RAID controller.  Some Bonnie tests even
show reads to be slower than writes (and re-write just sucks).

Also, with a chunk-size of 128 sectors doing the read-ahead in the
filesystem instead of the controller might vastly improve read
performance for small files (it wouldn't waste up to 128 sectors of
valuable cache memory).

As an aside, does anyone have any other advice about tuning the CRD-5500
and the filesystems to make it perform optimally for CVS, compiling, etc?

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>     <woods@robohack.ca>
Planix, Inc. <woods@planix.com>;   Secrets of the Weird <woods@weird.com>