Subject: Re: Kernel copyin/out optimizations for ARM...
To: None <port-arm@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 03/18/2002 15:51:56
On Mon, Mar 18, 2002 at 07:29:11AM -0800, Jason R Thorpe wrote:
> On Mon, Mar 18, 2002 at 01:38:19PM +0000, Richard Earnshaw wrote:
> 
>  > As far as I could tell from some experiments I ran, the predominant use of 
>  > copyin/copyout was for exec().
> 
> I benchmarked David's new version using lmbench's "bw_unix", which
> does a copyin of the data into a socket, and a copyout back out to
> userland.

I wonder what size copies it is doing.
I good test would give figures for different size :-)

You could do the following peephole optimisation on the original code.
Replace:
loop:	ldr	r3,[r0],#4
	str	r3,[r1],#4
	subs	r2,r4,#4
	bne	loop
with:
	ldr	r3,[r0],#4
loop:	subs	r2,r4,#4
	str	r3,[r1],#4
	ldrne	r3,[r0],#4
	bne	loop

Which (probably) fills all the delay slots.

	David

-- 
David Laight: david@l8s.co.uk