Subject: Re: Kernel copyin/out optimizations for ARM...
To: John Clark <firstname.lastname@example.org>
From: Richard Earnshaw <email@example.com>
Date: 03/14/2002 15:18:08
> Now if someone who is really familiar with the details of ARM
> implementations states that such multiple load ops are really dogger
> than dog slow, and the only truely fast ones are 32 bit load/store
> operations, I'll start looking at the DMA engine on the XScale
> companion chip...
Hmm, I didn't say that LDMs would be slower than ldr, just that I doubted
that they would make much difference to the performance here. The one
time I looked at this code suggested that it wasn't often copying large
amounts of data, so the overheads were presumably elsewhere.
Of course, if we are often calling this code with requests to transfer >64
bytes of word-aligned data; then that might change - but that didn't seem
to be the case.