Subject: Re: bcopy optimisation
To: None <port-arm32@NetBSD.ORG>
From: Olly Betts <olly@mantis.co.uk>
List: port-arm32
Date: 07/09/1996 00:18:32
In traditional net style, I've just spotted an error I introduced in munging
the code into my previous mail.

Olly Betts writes:
>[snip]
>|_alignedwordcpy|
>|_alignedwordcpylp3|
> SUBS    R2,R2,#4
> LDRGE   R3,[ip],#4
> STRGE   R3,[R1],#4
>; to unroll this loop, repeat these 3 instructions
> SUBGES  R2,R2,#4
> LDRGE   R3,[ip],#4
> STRGE   R3,[R1],#4
>;
> BNE     |_alignedwordcpylp3|

Make that: BGT     |_alignedwordcpylp3|

> MOVS    PC,R14

And it's actually 21% faster than SharedCLibrary on aligned word blocks (not
25%).

Olly