Subject: Re: Performance of various memcpy()'s
To: David Laight <david@l8s.co.uk>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: port-i386
Date: 10/16/2002 11:07:58
On Wed, Oct 16, 2002 at 12:58:52PM +0100, David Laight wrote:
> On Wed, Oct 16, 2002 at 04:18:30AM +0900, Bang Jun-Young wrote:
> > Hi,
> > 
> > About 14 monthes ago, I had some discussion on memcpy performance on
> > i386 platform here. Monthes later, I took a look into it again, and
> > now am coming with (not-so-)new benchmark results (attached). The
> > tests were performed on Athlon XP 1800 and DDR 256MB. 
> > 
> > >>From the results, it's obvious that memcpy() using MMX insns is the
> > best for in-cache sized data, typically 50-100% faster than plain old
> > memcpy for data <= 32 KB.
> 
> I've done some experiments on my slot-A athlon 700.

I believe the cache controller on that Athlon is significantly different
from that on the more recent respins of the design, FWIW.

There was a comp.arch thread on maximum attainable bandwidth on the P4
and athlon several months ago that culminated in a series of SSE2 copy
implementations that did something like 85% of theoretical; it was very
impressive.  I'll try to dig up the code, I think I saved it here 
somewhere.