Subject: Re: Kernel copyin/out optimizations for ARM...
To: John Clark <j1clark@ucsd.edu>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm
Date: 03/14/2002 15:18:08
j1clark@ucsd.edu said:
> Now if someone who is really familiar with the details of ARM
> implementations states that such multiple load ops are really dogger
> than dog slow, and the only truely fast ones are 32 bit load/store
> operations, I'll start  looking at the DMA  engine on the XScale
> companion chip... 

Hmm, I didn't say that LDMs would be slower than ldr, just that I doubted 
that they would make much difference to the performance here.  The one 
time I looked at this code suggested that it wasn't often copying large 
amounts of data, so the overheads were presumably elsewhere.

Of course, if we are often calling this code with requests to transfer >64 
bytes of word-aligned data; then that might change - but that didn't seem 
to be the case.

R.