Subject: Re: Kernel copyin/out optimizations for ARM...
To: None <firstname.lastname@example.org>
From: David Laight <email@example.com>
Date: 03/16/2002 21:50:45
On Sat, Mar 16, 2002 at 01:06:01PM +0000, David Laight wrote:
> > One idea though .... shouldn't we use `strt' _ONCE_ every page? i.e. either
> > the whole page is OK or the whole page is not .... doing it multiple times
> > is a bit silly....
> True - you either need to do it for every access, or once a page.
> To do it once a page you need nested loops - more scope for bugs.
> (and another 2 registers have to be saved - maybe not... wicked thought)
> - I feel a third verion coming along....
ok, todays code - and I'm definititly not playing with it again!
Copies aligned chunk of each page with ldmia/stmia.
I've fiddled with intruction ordering to reduce result delays.
I'm not sure of the exact rules! I think they are:
1 cycle for memory read on strongarm
2 cycles for memory read on XScale
I don't know if there are 'result delays' for arithmetic functions,
but I've allowed a cycle for them in places.
Further optimisations will just increase the code size! This
will speed up the benchmark, but displace real applications
from the cache. I might have gone too far already in places.
If desparate, the misaligned copy could be optimised for long
buffers, and another 4 registers could be saved for the ldm/stm
bits. But I doubt they are significant.
David Laight: firstname.lastname@example.org