Subject: Re: Kernel copyin/out optimizations for ARM...
To: None <port-arm@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 03/18/2002 15:51:56
On Mon, Mar 18, 2002 at 07:29:11AM -0800, Jason R Thorpe wrote:
> On Mon, Mar 18, 2002 at 01:38:19PM +0000, Richard Earnshaw wrote:
>
> > As far as I could tell from some experiments I ran, the predominant use of
> > copyin/copyout was for exec().
>
> I benchmarked David's new version using lmbench's "bw_unix", which
> does a copyin of the data into a socket, and a copyout back out to
> userland.
I wonder what size copies it is doing.
I good test would give figures for different size :-)
You could do the following peephole optimisation on the original code.
Replace:
loop: ldr r3,[r0],#4
str r3,[r1],#4
subs r2,r4,#4
bne loop
with:
ldr r3,[r0],#4
loop: subs r2,r4,#4
str r3,[r1],#4
ldrne r3,[r0],#4
bne loop
Which (probably) fills all the delay slots.
David
--
David Laight: david@l8s.co.uk