Subject: Re: bcopy
To: None <port-alpha@NetBSD.ORG>
From: Chris G Demetriou <Chris_G_Demetriou@BALVENIE.PDL.CS.CMU.EDU>
List: port-alpha
Date: 08/12/1995 20:55:36
> This one performs within 5 percent of the
> OSF1 libc bcopy. I tested it very thoroughly for correctness.

I didn't bother testing for correctness (if it's fast enough, it
doesn't matter if it works right, right?  8-), but i did do some
performance comparisons, and i thought i'd share them with everybody
on the list.

All tests were 10240 8k non-overlapping bcopy()s, with varying
alignments, and tests on each type of CPU were done with identical
binaries.

The OSF/1 libc bcopy was tested under OSF/1, and the NetBSD bcopy()s
(and Trevor's) were tested under NetBSD.


With a 233MHz 21064A processor:

			src & dst same alignment	different alignment
OSF/1 libc bcopy	~0.219s				~0.346s

NetBSD kernel bcopy	~5.503s*1			~5.503s*1

NetBSD libc 'C' bcopy	~0.561s				~7.34s

Trevor's asm bcopy	~0.223s				~0.350s


With a 266MHz 21164 processor:

			src & dst same alignment	different alignment
OSF/1 libc bcopy	~0.125s				~0.285s

NetBSD kernel bcopy	*2				*2

NetBSD libc 'C' bcopy	~0.46s				~8.5s *3

Trevor's asm bcopy	~0.125s				~0.266s


*1 -- very stable; i never saw more than 0.005s difference _any_ of the
runs with this code

*2 -- weird.  very weird.  in general:
	Time per 10240 copies ~= 6.4s + (srcoff + (7 - dstoff)) * 0.19s

*3 -- highly variable; some cases were repeatably as much as a second
more or less than this, and the machine was otherwise idle.
consistently longer than the 21064A, though!


it's interesting to note that:

	(1) the NetBSD libc 'C' and NetBSD kernel bcopy, in this set
	    of tests, seems to tickle the 21164 in a less-than-friendly
	    way.  I don't understand why this is.

	(2) Trevor's bcopy() appears to do _better_ on the 21164 than
	    the OSF/1 bcopy -- but not by much.  8-)


My hat's off to Trevor, for a job well done.  Now if only somebody
would do the same thing for the division and modulus routines...
8-)



chris