Subject: bcopy, bzero, copypage, and zeropage
To: None <port-m68k@NetBSD.ORG, tech-kern@NetBSD.ORG>
From: J.T. Conklin <jtc@cygnus.com>
List: tech-kern
Date: 12/09/1996 16:46:30
Those of you who read port-m68k know I've been fooling around with
improved implementations of block memory operation functions.  Several
people were kind enough to run some benchmarks on systems I don't have
access to.  As a result, I have an implementation which perform quite
nicely on all m68k family parts.

The reason I started down this path, was that profiles on my Sun3
showed that bcopy and bzero together took more than 10% of the time.
This clearly indicated that there was lots of room for beneficial
microoptimizations.

However most of the time for bcopy and bzero came from the calls in
pmap_copy_page and pmap_zero_page.  Clearly functions that copy/zero
only page size objects can be made to be faster than general purpose
copy/zero functions.  When I dug a little deeper, I discovered that
several m68k ports have copypage functions that do exactly that.

Some suggestions:

	* create a corresponding zeropage function --- in my
	  profiles, zeroing was done more than copying.

	* change all ports that currently call bcopy and bzero
	  to call copypage and zeropage.

	* possibly move copypage and zeropage from each port's 
	  locore.s to m68k/m68k/copy.s.

	* once the versions are consolidated, clean up and optimize
	  the versions.  For example, use dbf instead of subq/jne;
	  don't use back-to-back move16 insns (on '040s); only provide
	  a implementation for a specific cpu varient when configured
	  as such.

	* etc.

Thoughts?

	--jtc