Subject: Re: about powerpc version of in{,4}_cksum
To: None <port-powerpc@netbsd.org>
From: Matt Thomas <matt@3am-software.com>
List: port-powerpc
Date: 07/30/2002 11:11:06
At 11:01 AM 7/30/2002, Martin J. Laubach wrote:
>| ! "lwzu 7,4(%2);"
>| ! "lwzu 8,4(%2);"
>| ! "lwzu 9,4(%2);"
>| ! "lwzu 10,4(%2);"
>
> BTW, does it pay to unroll the loop? While playing with bzero
>I noticed that an unrolled version was somewhat slower than the
>straight forward loop on my G4, probably due to cache intricacies.
>How is that for in_cksum?
Another reason to not use lwzu is that is forces each instruction
to dependent on the previous one. That makes your pipeline stall.
lwz %r7,4(%r2)
lwz %r8,8(%r2)
lwz %r9,12(%r2)
lwzu %r10,16(%r2)
Would be better because they be executed in parallel (depending on
the number of load/store units).
--
Matt Thomas Internet: matt@3am-software.com
3am Software Foundry WWW URL: http://www.3am-software.com/bio/matt/
Cupertino, CA Disclaimer: I avow all knowledge of this message