Subject: Fast memcpy(3) making use of MMX instructions
To: None <tech-perform@netbsd.org>
From: Bang Jun-Young <bjy@mogua.org>
List: tech-perform
Date: 08/13/2001 20:30:40
After reading the article "Optimizing CPU to Memory Accesses
on the SGI Visual Workstations 320 and 540", I decided to write
code that does the same on NetBSD/i386 myself, and here is the
result of the work done during the last couple of months:

  http://my.dreamwiz.com/bangjy/fast_memcpy/fast_memcpy-20010813.tar.gz

At first, I expected huge improvement (at least the author insisted
that he got 250% improvement), but the result was disappointing.
Many of optimization technics used in the code made memcpy slower, and
surprisingly, plain i386 code was the fastest among them!

Of course, I shouldn't forget to mention some gave me a little 
performance improvement indeed when buffer sizes were large (>1MB).  

Now my question is: is copying 100MB of data back and forth occuring 
frequently in the real world as well? Where can this code fit best?

Any comments are welcome and appreciated,

Jun-Young

-- 
Bang Jun-Young <bjy@mogua.org>