Subject: Re: Performance of various memcpy()'s
To: None <port-i386@netbsd.org>
From: None <wleeson@indigo.ie>
List: port-i386
Date: 10/16/2002 19:31:20
Hi,
	There was an excellent article on this kinda stuff at the GDC. It involved telling the CPU what to cache before it was used. They showed some really impressive performance gains. The article and presentation are available on the AMD web site.

Regards,
	Willy

On Wednesday, 16 October 2002 at 11:07:58 -0400, Thor Lancelot Simon wrote:
> On Wed, Oct 16, 2002 at 12:58:52PM +0100, David Laight wrote:
> > On Wed, Oct 16, 2002 at 04:18:30AM +0900, Bang Jun-Young wrote:
> > > Hi,
> > > 
> > > About 14 monthes ago, I had some discussion on memcpy performance on
> > > i386 platform here. Monthes later, I took a look into it again, and
> > > now am coming with (not-so-)new benchmark results (attached). The
> > > tests were performed on Athlon XP 1800 and DDR 256MB. 
> > > 
> > > >>From the results, it's obvious that memcpy() using MMX insns is the
> > > best for in-cache sized data, typically 50-100% faster than plain old
> > > memcpy for data <= 32 KB.
> > 
> > I've done some experiments on my slot-A athlon 700.
> 
> I believe the cache controller on that Athlon is significantly different
> from that on the more recent respins of the design, FWIW.
> 
> There was a comp.arch thread on maximum attainable bandwidth on the P4
> and athlon several months ago that culminated in a series of SSE2 copy
> implementations that did something like 85% of theoretical; it was very
> impressive.  I'll try to dig up the code, I think I saved it here 
> somewhere.
> 
>