Port-powerpc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: about powerpc version of in{,4}_cksum



At 11:01 AM 7/30/2002, Martin J. Laubach wrote:
|  !                            "lwzu 7,4(%2);"
|  !                            "lwzu 8,4(%2);"
|  !                            "lwzu 9,4(%2);"
|  !                            "lwzu 10,4(%2);"

  BTW, does it pay to unroll the loop? While playing with bzero
I noticed that an unrolled version was somewhat slower than the
straight forward loop on my G4, probably due to cache intricacies.
How is that for in_cksum?

Another reason to not use lwzu is that is forces each instruction
to dependent on the previous one.  That makes your pipeline stall.

        lwz %r7,4(%r2)
        lwz %r8,8(%r2)
        lwz %r9,12(%r2)
        lwzu %r10,16(%r2)

Would be better because they be executed in parallel (depending on
the number of load/store units).


--
Matt Thomas               Internet:   matt%3am-software.com@localhost
3am Software Foundry      WWW URL:    http://www.3am-software.com/bio/matt/
Cupertino, CA             Disclaimer: I avow all knowledge of this message




Home | Main Index | Thread Index | Old Index