Subject: Re: Fast memcpy(3) making use of MMX instructions
To: None <tech-perform@netbsd.org>
From: Andreas Persson <pap.is@home.se>
List: tech-perform
Date: 08/20/2001 16:44:52
On Mon, Aug 20, 2001 at 09:58:03PM +0900, Bang Jun-Young <bjy@mogua.org> wrote:
>>From the results:
> - Utilizing MMX for memcpy gives _no_ gain on Intel processor. Only
>   AMD processors can benefit from it. I guess Linux people already
>   knew that.
Not true. I am currently writing an memcpy and the initial results are very
good. I haven't coded assembly since the Pentium so I'm a bit rusty...
Anyway here are some nice (preliminary) numbers:
When doing memcpy where the source is not in cache, I get about 350-600
Mb/s on my 800 mhz Pentium III.
./memcpy_test: 65536 bytes, 140903 clocks, 0.465114 bytes/clock, throughput 372.
091439 mb/s
1024
./memcpy_test: 65536 bytes, 87083 clocks, 0.752569 bytes/clock, throughput 602.0
55510 mb/s
1024
./memcpy_test: 65536 bytes, 133499 clocks, 0.490910 bytes/clock, throughput 392.
728035 mb/s

When doing smaller copies (1-4k) when the source is in cache, I got about
1.1-1.9 Gb/s. Now I've gone and tried to optimize it further, and its much
slower for that case. Go figure. Also sometimes I get extremely strange
results:
./memcpy_test: 4096 bytes, 40 clocks, 102.400000 bytes/clock
./memcpy_test: 4096 bytes, 12 clocks, 341.333333 bytes/clock
I've verified that its not a bug on my part.

> - Memory transfer performance of Pentium III (machine) is even lower
>   than that of Athlon (machine).
Well, I don't believe your code is optimized for either the Pentium III or
the Athlon.

>Jun-Young
>
>-- 
>Bang Jun-Young <bjy@mogua.org>
>

-- 
Andreas Persson
pap.is@home.se