Subject: Re: Fast memcpy(3) making use of MMX instructions
To: Andreas Persson <pap.is@home.se>
From: Bang Jun-Young <bjy@mogua.org>
List: tech-perform
Date: 08/18/2001 11:23:35
On Fri, Aug 17, 2001 at 10:52:05PM +0200, Andreas Persson wrote:
> On Mon, Aug 13, 2001 at 08:30:40PM +0900, Bang Jun-Young <bjy@mogua.org> wrote:
> >At first, I expected huge improvement (at least the author insisted
> >that he got 250% improvement), but the result was disappointing.
> >Many of optimization technics used in the code made memcpy slower, and
> >surprisingly, plain i386 code was the fastest among them!
> Most of that improvement probably comes from prefetching the data. Of course
> your data will always be in l1 cache since you initialise it just before
> calling memcpy(). Additional benefits also comes from not polluting the cache
> as much (with non-temporal prefetching and stores), but the performance
> gain is hard to measure there. These techniques do have a certain amount
> of overhead, for small copies, rep movsd is almost certainly faster. Also
> this code is not optimized at for the Athlon, which likes things a bit
> different from the intel cpus. Btw, the prefetch versions dumps core on my
> system due to intel and amd having different prefetch instructions.

I suspect your machine is not Pentium III (or 4?). Prefetch
instructions `prefetch*' were introduced in Pentium III and
cause invalid opcode exception on earlier machines.

Jun-Young

-- 
Bang Jun-Young <bjy@mogua.org>