Subject: Re: Kernel copyin/out optimizations for ARM...
To: John Clark <j1clark@ucsd.edu>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm
Date: 03/12/2002 10:38:39
j1clark@ucsd.edu said:
> While my main interest is the XScale, it seems that some form of
> improvement may be had for all the arm processors. I was looking at
> the copyin/out  functions, and noticed that after all the checks there
> is only the check for 'is it  bigger than 4 bytes, and is it 32 bit
> aligned', then copy 32 bit words...

> It seems that this could be optimized better to use the multiple load
> features of the cpu to improve copies. The Libc memcpy seems to do
> this.

> Is there some reason why this was not done in the kernel?

> In the case of the XScale, it is capable of doing a 64 bit transfer if
>  things are 'lined up right', and two registers used, and caching is
> on, etc.  etc.

I've always wondered why that code wasn't written using ldrt/strt for the 
user-space accesses.  That would then use the hardware for permission 
checking and eliminate the most expensive part of that code (doing the tlb 
check manually).

Maybe there is some reason why that would not work (wrong page tables 
mapped?)

R.