Re: about powerpc version of in{,4}_cksum

To: Simon Burge <simonb%wasabisystems.com@localhost>
Subject: Re: about powerpc version of in{,4}_cksum
From: enami tsugutomo <enami%but-b.or.jp@localhost>
Date: Wed, 31 Jul 2002 00:05:15 +0900 (JST)

> > 1) addze 7,7 or addze %1,%1 are used to clear carry bit, but they
> >    aren't correct.  The former fails to clear if junk register r7
> >    happen to contain 0xffffffff, and the latter may crobber non-junk
> >    (i.e., necessary) register.
> 
> I'm not sure that "addic 0,0,0" is going to do the right thing either
> for clearing the carry.  For roughly this code fragment
> 
> >                       n = mlen >> 6;
> > !                     __asm __volatile(
> > !                         "addic 0,0,0;"              /* clear carry */
> > !                         "mtctr %1;"                 /* load loop count */
> > !                         "1:"
> > !                         "lwz 7,4(%2);"              /* load current data
> 
> here's the "objdump -d"
> 
>        12c:   7f e0 36 70     srawi   r0,r31,6
>        130:   39 66 ff fc     addi    r11,r6,-4
>        134:   30 00 00 00     addic   r0,r0,0
>        138:   7c 09 03 a6     mtctr   r0
>        13c:   80 eb 00 04     lwz     r7,4(r11)
> 
> gcc has put "n" in r0, so the "addic 0,0,0" will add the carry bit to
> "n".  Allen suggested the following:
> 
>       addi 7,0,0; addic 7,7,0;
> 
> "make sure r7 is zero, then add 0 to it w/ carry".  We choose r7 since
> that is one of the regs we mark as clobbered.

`addic rD, rA, SIMMM' is defined as (as far as according to my
manual):

        rD <- (rA) + EXTS(SIMM)

So, no carry is added (but carry is affected), and if SIMM is zero,
carry bit always cleared.  So, addic 0,0,0 should be nop except that
it clears carry bit.  Am I losing something?

> > 2) When adjusting to 4 byte boundary, just adding 16bit value to the
> >    variable `sum' isn't enough, since the `sum' may have full 32bit
> >    value there, depending on how a packet is divided into mbufs.  So,
> >    we need to care carry bit.  This actually prevented my Mac
> >    (g4-500dp) from netbooting.  (we can REDUCE instead but it results
> >    longer instructions).
> 
> I wonder if just:
> 
>               if ((3 & (long) w) && (mlen > 0)) {
>                       REDUCE1;
>                       if ((1 & (long) w)) {
>                               sum <<= 8;
>                               s_util.c[0] = *w++;
>                               mlen--;
>                               byte_swapped = 1;
>                       }
>                       if ((2 & (long) w) && (mlen > 1)) {
>                               sum += *(uint16_t *)w;
>                               w += 2;
>                               mlen -= 2;
>                       }
>               }
> 
> wouldn't be better?  It also occurs to me that only the last REDUCE
> has to be a REDUCE; the others can be a REDUCE1 - we don't care in the
> intermediate code whether we've reduced to a 16-bit or 17-bit value.

The difference between my asm() and sum += *(...) is just addc;addze
vs add.  And I guess REDUCE1 is much instruction than the differnce,
and I'm not sure if how the odd byte case is so familier comparing to
2 byte unalingned case.

> > 3) In asm statemnt, constraint letter "b" (base register) should be
> >    used instead of "r" for pointer operand.
> 
> Ok, I didn't see the ppc constraints.  Does this make any real-world
> differences?

Please read the rs6000.h.  The differnce is whether r0 can be used or
not.  And if you read my diff, you will notoice that forcing r0 for
the variable `n' is no longer necessary.

> That command line is 2000000 iterations over ten different random 1532
> byte mbufs, with a random alignment (the first column is "alignment %
> 4") for each mbuf.  So for the 405GP, there's ~no difference in speed.
> Given that there is a marked difference with your Mac, it would seem
> that switching to non-update loads seems like a sound decision.  It
> would be interesting to get benchmarks for other CPU models.

Could you please provide me your test case?  As I wrote in my mail, I
just test only single pattern.

enami.

Follow-Ups:
- Re: about powerpc version of in{,4}_cksum
  - From: Simon Burge

Prev by Date: Re: about powerpc version of in{,4}_cksum
Next by Date: Re: about powerpc version of in{,4}_cksum
Previous by Thread: Re: about powerpc version of in{,4}_cksum
Next by Thread: Re: about powerpc version of in{,4}_cksum
Indexes:

Home | Main Index | Thread Index | Old Index