Subject: Re: Kernel copyin/out optimizations for ARM...
To: None <>
From: David Laight <>
List: port-arm
Date: 03/16/2002 21:50:45
On Sat, Mar 16, 2002 at 01:06:01PM +0000, David Laight wrote:
> > 
> > One idea though .... shouldn't we use `strt' _ONCE_ every page? i.e. either 
> > the whole page is OK or the whole page is not .... doing it multiple times 
> > is a bit silly....
> True - you either need to do it for every access, or once a page.
> To do it once a page you need nested loops - more scope for bugs.
> (and another 2 registers have to be saved - maybe not... wicked thought)
> - I feel a third verion coming along....

ok, todays code - and I'm definititly not playing with it again!
Copies aligned chunk of each page with ldmia/stmia.

I've fiddled with intruction ordering to reduce result delays.
I'm not sure of the exact rules! I think they are:
	1 cycle for memory read on strongarm
	2 cycles for memory read on XScale
I don't know if there are 'result delays' for arithmetic functions,
but I've allowed a cycle for them in places.

Further optimisations will just increase the code size!  This
will speed up the benchmark, but displace real applications
from the cache.  I might have gone too far already in places.

If desparate, the misaligned copy could be optimised for long
buffers, and another 4 registers could be saved for the ldm/stm
bits.  But I doubt they are significant.


David Laight: