Subject: Re: 4xx copyin/copyout [Was: CVS commit: src/sys/arch/powerpc/ibm4xx]
To: Simon Burge <simonb@NetBSD.org>
From: Herb Peyerl <hpeyerl@beer.org>
List: port-powerpc
Date: 11/27/2007 08:42:04
Simon Burge wrote:
> Juergen Hannken-Illjes wrote:
>
>   
>> This breaks with alignment trap on the 1st copyout for evbppc/explora:
>> 	kaddr=0x2fda88 udaddr=0xfffebff5 len=11
>>
>> Trap is at this line:
>>
>> 	"   stw %[tmp],0(%[udaddr]);"       /* Store user word */
>>     
> If this works, we can possibly also look at unrolling the word loop a
> bit since the load/store string instructions can do up to 32-bytes per
> instruction.  Does anyone know how to request a number of consecutive
> registers with gcc asm constraints?  If not, it might be easier to break
> these two assembly fragments out to their own .S file...
>
> If this patch doesn't fix the Explora, we can just add a alignment check
> (if on a 403) and skip the word-at-a-time loop, although the trailing
> loop will need to be updated to not just use "len % 4" bytes.
>   

Sorry for the breakage Jeurgen ...

I briefly tried a minor unroll but it didn't work on the first attempt 
and I had other problems to deal with so I punted ... It was decidedly 
less than elegant but I just wanted to measure the impact on performance 
to see if it was worth the effort ...


 
       "    srwi %[count],%[len],0x2;"
        "    beq- 2f;"
        "1:  mtpid %[pid];sync;"
        "    andi. %[tmp],%[count],3;"
        "    beq 111f;"
        "    andi. %[tmp],%[count],2;"
        "    beq 110f;"
        "    andi. %[tmp],%[count],1;"
        "    beq 101f;"
        "    b 100f;"
        "111:lwz %[tmp4],12(%[kaddr]);"
        "110:lwz %[tmp3],8(%[kaddr]);"
        "101:lwz %[tmp2],4(%[kaddr]);"
        "100:lwz %[tmp1],0(%[kaddr]);"
        "    sync; isync;"
        "    mtpid %[ctx]; sync;"
        "    andi. %[tmp],%[count],3;"
        "    beq 211f;"
        "    andi. %[tmp],%[count],2;"
        "    beq 210f;"
        "    andi. %[tmp],%[count],1;"
        "    beq 201f;"
        "    b 200f;"
        "211:stw %[tmp4],12(%[udaddr]);"
        "210:stw %[tmp3],8(%[udaddr]);"
        "201:stw %[tmp2],4(%[udaddr]);"
        "200:stw %[tmp1],0(%[udaddr]);"
        "   subfic %[count], %[count], %[tmp];"
        "   bne 1b;"