Subject: Re: about powerpc version of in{,4}_cksum
To: None <port-powerpc@netbsd.org>
From: Matt Thomas <matt@3am-software.com>
List: port-powerpc
Date: 07/30/2002 11:11:06
At 11:01 AM 7/30/2002, Martin J. Laubach wrote:
>|  !                            "lwzu 7,4(%2);"
>|  !                            "lwzu 8,4(%2);"
>|  !                            "lwzu 9,4(%2);"
>|  !                            "lwzu 10,4(%2);"
>
>   BTW, does it pay to unroll the loop? While playing with bzero
>I noticed that an unrolled version was somewhat slower than the
>straight forward loop on my G4, probably due to cache intricacies.
>How is that for in_cksum?

Another reason to not use lwzu is that is forces each instruction
to dependent on the previous one.  That makes your pipeline stall.

         lwz %r7,4(%r2)
         lwz %r8,8(%r2)
         lwz %r9,12(%r2)
         lwzu %r10,16(%r2)

Would be better because they be executed in parallel (depending on
the number of load/store units).


-- 
Matt Thomas               Internet:   matt@3am-software.com
3am Software Foundry      WWW URL:    http://www.3am-software.com/bio/matt/
Cupertino, CA             Disclaimer: I avow all knowledge of this message