Subject: bcopy optimisation
To: None <port-arm32@NetBSD.ORG>
From: Mark Brinicombe <amb@physig4.ph.kcl.ac.uk>
List: port-arm32
Date: 07/03/1996 17:39:17
On the subject of performance, there are a number of kernel; routines that
could be optimised that I have not done yet. A number of these do not require
the coder to understand the ins and outs of the kernel and could be done by
users.

One example is bcopy. An overlapping high performance bcopy/memcpy is needed to
replace the current one.
Since the bcopy code is not kernel specific the routine could be developed and
benchmarked against the current one in a user process.

In addition to making it fast typically using the LDM and STM instructions
consideration needs to be given to the sizes being copied. Logging statistics
for the bcopy routine shows that it is regularly called for certain sizes
of copy far more frequently than others.
The most common sizes are 12, 8, 128, 6, 4, 16, 2 in that order.
This may mean that the best performance will be gained if these sizes are
spotted and specially coded.

If you want to go further the alignment of the src and destination addresses
needs to be looked at, again to help design the best bcopy for the job.

Any takers for the job or do I have to add this to my todo list ?

Cheers,
				Mark

-- 
Mark Brinicombe				amb@physig.ph.kcl.ac.uk
Research Associate			http://www.ph.kcl.ac.uk/~amb/
Department of Physics			tel: 0171 873 2894
King's College London			fax: 0171 873 2716