At 11:01 AM 7/30/2002, Martin J. Laubach wrote:
| ! "lwzu 7,4(%2);" | ! "lwzu 8,4(%2);" | ! "lwzu 9,4(%2);" | ! "lwzu 10,4(%2);" BTW, does it pay to unroll the loop? While playing with bzero I noticed that an unrolled version was somewhat slower than the straight forward loop on my G4, probably due to cache intricacies. How is that for in_cksum?
Another reason to not use lwzu is that is forces each instruction to dependent on the previous one. That makes your pipeline stall. lwz %r7,4(%r2) lwz %r8,8(%r2) lwz %r9,12(%r2) lwzu %r10,16(%r2) Would be better because they be executed in parallel (depending on the number of load/store units). -- Matt Thomas Internet: matt%3am-software.com@localhost 3am Software Foundry WWW URL: http://www.3am-software.com/bio/matt/ Cupertino, CA Disclaimer: I avow all knowledge of this message