Subject: Re: new mips cache performance
To: Simon Burge <simonb@wasabisystems.com>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: port-mips
Date: 11/18/2001 10:06:41
On Mon, Nov 19, 2001 at 01:00:36AM +1100, Simon Burge wrote:

 > I've tried a simple benchmark of building a pmax kernel three times in a
 > row with kernels built both pre-merge and post-merge of the mips cache
 > branch (using the "2001-11-14 18:00:00 UTC" and "2001-11-15 12:20:00
 > UTC" date tags).  The post-merge figures were slightly worse (about 30
 > seconds slower over an hourish build).  At Jason's suggestion, I removed
 > the check for an L2 cache in the __mco_noargs and __mco_2args macros
 > in <mips/cache.h> so that it always called the L2 cache ops and that
 > shaved about 108 seconds off the average benchmark time making it about
 > 75 seconds quicker than a pre-merge kernel.

Hm, okay.  I'm kind of annoyed that the test is that expensive :-/

So, couple of options, here..

	(1) Make noop L2 cache routines for platforms which don't
	    have them, and always let the code jump into the L2
	    routine.

	(2) Do the pseudo-vector thing.  Since the individual cache
	    primitives are too large to stuff into here, we would
	    have to copy cache-op-call-sites into the pvecs.  This would
	    mean stack allocation, saving some regs, etc. in the pvecs.

I'm leaning towards (1), since, as a macro, the compiler would have
a better time of optimizing the code around the call sites.

 > The tests were run on a DECsystem 5000/260 (R4400 at 60MHz, 16k L1
 > Icache, 16k L1 Dcache and 1MB L2 cache, 192MB RAM) with source and
 > kernel compile directory on local disk, using a GENERIC kernel that
 > has both the MIPS1 and MIPS3 options.

BTW, what's the line size of our L2 cache?  128 bytes?  We could probably
squeeze some more out by writing 128-byte optimized L2 cache ops (which
unroll the loop somewhat).

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>