Subject: Re: Fast memcpy(3) making use of MMX instructions
To: Andreas Persson <pap.is@home.se>
From: Bang Jun-Young <bjy@mogua.org>
List: tech-perform
Date: 08/24/2001 22:55:35
On Mon, Aug 20, 2001 at 04:44:52PM +0200, Andreas Persson wrote:
> On Mon, Aug 20, 2001 at 09:58:03PM +0900, Bang Jun-Young <bjy@mogua.org> wrote:
> >>From the results:
> > - Utilizing MMX for memcpy gives _no_ gain on Intel processor. Only
> >   AMD processors can benefit from it. I guess Linux people already
> >   knew that.
> Not true. I am currently writing an memcpy and the initial results are very
> good. I haven't coded assembly since the Pentium so I'm a bit rusty...
> Anyway here are some nice (preliminary) numbers:
> When doing memcpy where the source is not in cache, I get about 350-600
> Mb/s on my 800 mhz Pentium III.
> ./memcpy_test: 65536 bytes, 140903 clocks, 0.465114 bytes/clock, throughput 372.
> 091439 mb/s
> 1024
> ./memcpy_test: 65536 bytes, 87083 clocks, 0.752569 bytes/clock, throughput 602.0
> 55510 mb/s
> 1024
> ./memcpy_test: 65536 bytes, 133499 clocks, 0.490910 bytes/clock, throughput 392.
> 728035 mb/s
> 
> When doing smaller copies (1-4k) when the source is in cache, I got about
> 1.1-1.9 Gb/s. Now I've gone and tried to optimize it further, and its much
> slower for that case. Go figure. Also sometimes I get extremely strange
> results:
> ./memcpy_test: 4096 bytes, 40 clocks, 102.400000 bytes/clock
> ./memcpy_test: 4096 bytes, 12 clocks, 341.333333 bytes/clock
> I've verified that its not a bug on my part.

Hmm, where can I find your code?

> 
> > - Memory transfer performance of Pentium III (machine) is even lower
> >   than that of Athlon (machine).
> Well, I don't believe your code is optimized for either the Pentium III or
> the Athlon.

Look at the previous results obtained from plain i386 memcpy(3)
from NetBSD libc again:

		Athlon 1000Mhz	Pentium III 800Mhz    Ratio 
64 bytes	     122	     131              1.07 	   
4096 bytes          15300           64755             4.23
100000 bytes        377938         1748680            4.62
1000000 bytes      7055049         19380587           2.75

Obviously Athlon does memcpy (a lot) faster than the competitor
on NetBSD/i386, doesn't it?

Jun-Young

-- 
Bang Jun-Young <bjy@mogua.org>