Subject: Re: m68k bcopy implementation performance results
To: None <port-m68k@NetBSD.ORG>
From: Ignatios Souvatzis <is@beverly.rhein.de>
List: port-m68k
Date: 12/18/1996 21:26:46
> Thanks to all who sent me results of the bcopy benchmark --- sorry I
> didn't mention that it might take days to execute.

Hehe, I scaled it down by a factor of 10/100 for some of the tests, at
least on the 68030 machine... I don't have days of uptime. All my
machines are in my bedroom, and I'm not deaf yet, neither want to be :-)

> I was a bit surprised by the results.  The two optimized versions were
> between 65 and 105% faster on the '020, '040, and '060.  But there was
> barely any improvement on '030 systems.  I don't know why --- the
> benchmarks should be large enough to eliminate (or at least make
> visable) any data cache effects.

Well, my results clearly show some 15% (if I recall right, but I sent
the details to jtc) on the 68030 ... maybe my machine has other memory
access times (5-2-2-2 burst access) than the other '030 testers' one.

Additionally, I did some wierd tests manipulating the A3000's RAM controller.
If I turn off burst accesses, performance for long copies increases by 
another 10 or 15%. I suspect this is caused by write allocation. So my 
suggestion is to switch write allocation off before copypage/zeropage, and 
switch it back on afterwards, on the 030 (same code as the 020, which ignores 
this bit in the CACR). I'll try to benchmark this "soon".

They further show _NO_ improvement on the 68060 for duffs device, if the
branch prediction is on (but a factor of two improvement to the old
bcopy, branch prediction off case), and a few percent slowdown of the
simple unrolled loop to the (old code/duffsdev) case if branch
prediction is on. (Yes, it is on now for the 68060 --- I finally found
time to run a few tests and be sure I found that caching bug which hit
me half a year ago).

Just thought people might like to hear about this.

Regards,
	Ignatios Souvatzis
	Ignatios Souvatzis