Subject: Re: bcopy optimisation
To: None <port-arm32@NetBSD.ORG>
From: Olly Betts <olly@MANTIS.CO.UK>
List: port-arm32
Date: 07/04/1996 14:28:36
"Mark Brinicombe" writes:
>[Fast bcopy required]
>
>In addition to making it fast typically using the LDM and STM instructions
>consideration needs to be given to the sizes being copied. Logging statistics
>for the bcopy routine shows that it is regularly called for certain sizes
>of copy far more frequently than others.
>The most common sizes are 12, 8, 128, 6, 4, 16, 2 in that order.
>This may mean that the best performance will be gained if these sizes are
>spotted and specially coded.

If most of these copies are done with a constant length, i.e.:

memcpy( a, b, 12 );

Rather than:

memcpy( a, b, len );

where len is usually 12, then it might be better to get the compiler to
spot them and call a tailored routine _memcpy12( a, b ) which was capable
of being in-lined.  Here's a version of _memcpy2 to clarify the idea:

_memcpy2
 LDRB   R2,[R1]
 LDRB   R3,[R1,#1]
 STRB   R2,[R0]
 STRB   R3,[R0,#1]
 MOV    PC,R14

Anyone know how easy is it to get GCC to do this sort of thing?

>If you want to go further the alignment of the src and destination addresses
>needs to be looked at, again to help design the best bcopy for the job.
>
>Any takers for the job or do I have to add this to my todo list ?

I'm happy to write this sort of stuff, though I may not get done very
quickly.  I've already written a very fast strlen() if that's any use.

Olly