Subject: Re: Performance of various memcpy()'s
To: David Laight <david@l8s.co.uk>
From: Bang Jun-Young <junyoung@mogua.com>
List: tech-perform
Date: 10/23/2002 11:37:45
On Wed, Oct 23, 2002 at 12:04:37AM +0100, David Laight wrote:
> > BTW, where's 'rep movsw'? memcpy_rep_movsl is pretty much the same as
> > libc memcpy.
> 
> Indeed - however there is a significant saving in not using
> rep movsb to move the odd bytes.
> The setup cost for movsb is quite significant, on my athlon (IIRC)
> the cost for rep movsl is such that it is only worth using
> for moderate length copies - indeed using word copies for short
> transfers and MMX for long transfers could easily be a win over
> rep movsl.

This is really interesting. With addition of just two lines of code
to memcpy, it's 20% faster for data < 512 bytes!

BTW, I noticed that our i386 memcpy() in libc checks for overlapping,
although the manpage says "to copy byte strings that overlap, use
memmove(3)."

Jun-Young

-- 
Bang Jun-Young <junyoung@mogua.com>