Subject: Kernel copyin/out optimizations for ARM...
To: None <port-arm@netbsd.org>
From: John Clark <j1clark@ucsd.edu>
List: port-arm
Date: 03/11/2002 22:28:54
While my main interest is the XScale, it seems that some form of 
improvement
may be had for all the arm processors. I was looking at the copyin/out 
functions,
and noticed that after all the checks there is only the check for 'is it 
bigger than
4 bytes, and is it 32 bit aligned', then copy 32 bit words...

It seems that this could be optimized better to use the multiple load 
features
of the cpu to improve copies. The Libc memcpy seems to do this.

Is there some reason why this was not done in the kernel?

In the case of the XScale, it is capable of doing a 64 bit transfer if 
things
are 'lined up right', and two registers used, and caching is on, etc. 
etc.