Subject: Re: Performance of various memcpy()'s
To: <>
From: David Laight <david@l8s.co.uk>
List: port-i386
Date: 10/16/2002 00:12:01
On Wed, Oct 16, 2002 at 04:18:30AM +0900, Bang Jun-Young wrote:
> Hi,
> 
> About 14 monthes ago, I had some discussion on memcpy performance on
> i386 platform here. Monthes later, I took a look into it again, and
> now am coming with (not-so-)new benchmark results (attached). The
> tests were performed on Athlon XP 1800 and DDR 256MB. 

Well, the P4 (I don't have one) claims to execute 'rep movsl'
in the cache controller for suitable long and aligned transfers....
So whether SSE2 copies (which must also be aligned) are faster
is any bodies guess.

The other question is how many copies are actually long?
Otherwise the red tape starts becoming significant.
As does the code size itself - unless it is part of the
permanent working set of the application.

Oh - why not write assembler in assembler?
The 'asm' statements are epically hard to read :-(
(I might try decoding them tomorrow.)

	David