tech-perform: Re: Fast memcpy(3) making use of MMX instructions

Subject: Re: Fast memcpy(3) making use of MMX instructions
To: None <tech-perform@netbsd.org>
From: Andreas Persson <pap.is@home.se>
List: tech-perform
Date: 08/17/2001 22:52:05

On Mon, Aug 13, 2001 at 08:30:40PM +0900, Bang Jun-Young <bjy@mogua.org> wrote:
>At first, I expected huge improvement (at least the author insisted
>that he got 250% improvement), but the result was disappointing.
>Many of optimization technics used in the code made memcpy slower, and
>surprisingly, plain i386 code was the fastest among them!
Most of that improvement probably comes from prefetching the data. Of course
your data will always be in l1 cache since you initialise it just before
calling memcpy(). Additional benefits also comes from not polluting the cache
as much (with non-temporal prefetching and stores), but the performance
gain is hard to measure there. These techniques do have a certain amount
of overhead, for small copies, rep movsd is almost certainly faster. Also
this code is not optimized at for the Athlon, which likes things a bit
different from the intel cpus. Btw, the prefetch versions dumps core on my
system due to intel and amd having different prefetch instructions.

>Of course, I shouldn't forget to mention some gave me a little 
>performance improvement indeed when buffer sizes were large (>1MB).  
>
>Now my question is: is copying 100MB of data back and forth occuring 
>frequently in the real world as well? Where can this code fit best?
Very rarely, at best. But try my suggestions above and let me know if it
helps.

>Any comments are welcome and appreciated,
>
>Jun-Young
>
>-- 
>Bang Jun-Young <bjy@mogua.org>
>

-- 
Andreas Persson
pap.is@home.se