Subject: Re: Performance of various memcpy()'s
To: David Laight <david@l8s.co.uk>
From: Bang Jun-Young <junyoung@mogua.com>
List: tech-userlevel
Date: 10/28/2002 18:46:23
[Oops, I'm sorry - previous mail was sent in euc-kr charset, due to a bug
 in Mutt.]

On Mon, Oct 28, 2002 at 09:01:11AM +0000, David Laight wrote:
> Given the significant performance improvent, I'd go for:
> 
> > +ENTRY(memcpy)
> > +	pushl	%esi
> > +	pushl	%edi
> > +
> > +	movl	12(%esp),%edi
> > +	movl	16(%esp),%esi
> > +	movl	20(%esp),%ecx
> > +	movl	%edi,%eax	/* return value */
> > +
> > +	movl	%ecx,%edx
> > +	cld			/* nope, copy forwards. */
> > +	shrl	$2,%ecx		/* copy by words */
> > +	rep
> > +	movsl
> 
> 	andl	$3,%edx
> 	jne	1f
> 	popl	%edi
> 	popl	%esi
> 	ret
>    1:
> > +	movl	%edx,%ecx
> > +	rep
> > +	movsb
> > +	popl	%edi
> > +	popl	%esi
> > +	ret

That is included in my new i686_copy{in,out}() in (hopefully) a cleaner
and shorter way. I'm still investigating how much it gives. 

> 
> Or even finish off with:
> 	movb	(%esi),%cl
> 	decl	%edx
> 	movb	%cl,(%di)
> 	jne	1b
> 	popl	%edi
> 	popl	%esi
> 	ret
> 
> 	David

Mixing word size and byte size registers is generally not a good idea.
Intel manual says that it slows down performance, and I confirmed that
via memcpy tests.

Jun-Young

-- 
Bang Jun-Young <junyoung@mogua.com>