Subject: Re: copyin/out
To: Chris Gilbert <chris@paradox.demon.co.uk>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: port-arm
Date: 08/09/2002 09:16:27
On Fri, Aug 09, 2002 at 10:16:52AM +0100, Chris Gilbert wrote:

 > Quick look over it, do you need to preload the addresses you're storing to?
 > or does that cause it to fetch the tlb entries for speed?  IE aren't you
 > just filling the cache with stuff you're about to overwrite?

On some processors, in certain modes, the cache does not allocate a line
on a write-miss, and you essentially get write-through semantics.  Prefetching
the destination into the cache means you get write-back semantics always,
and lets the cache clean the line to put that data in before you actually
*need* it.

 > Hmm, I see near enough that already on cats 1.6D.
 > 1073741824 bytes transferred in 17.343 secs (61912115 bytes/sec)

Interesting.  The performance characteristics of the old code were
VERY different on a 400MHz i80321 (XScale core).  Indeed the old code
on my Shark can do:

	1073741824 bytes transferred in 15.120 secs (71014670 bytes/sec)

and the new code on the Shark yields:

	1073741824 bytes transferred in 8.447 secs (127115167 bytes/sec)

That is a SIGNIFICANT improvement.

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>