Subject: Re: 4xx copyin/copyout [Was: CVS commit: src/sys/arch/powerpc/ibm4xx]
To: Simon Burge <simonb@NetBSD.org>
From: Herb Peyerl <hpeyerl@beer.org>
List: port-powerpc
Date: 11/27/2007 08:42:04
Simon Burge wrote:
> Juergen Hannken-Illjes wrote:
>
>
>> This breaks with alignment trap on the 1st copyout for evbppc/explora:
>> kaddr=0x2fda88 udaddr=0xfffebff5 len=11
>>
>> Trap is at this line:
>>
>> " stw %[tmp],0(%[udaddr]);" /* Store user word */
>>
> If this works, we can possibly also look at unrolling the word loop a
> bit since the load/store string instructions can do up to 32-bytes per
> instruction. Does anyone know how to request a number of consecutive
> registers with gcc asm constraints? If not, it might be easier to break
> these two assembly fragments out to their own .S file...
>
> If this patch doesn't fix the Explora, we can just add a alignment check
> (if on a 403) and skip the word-at-a-time loop, although the trailing
> loop will need to be updated to not just use "len % 4" bytes.
>
Sorry for the breakage Jeurgen ...
I briefly tried a minor unroll but it didn't work on the first attempt
and I had other problems to deal with so I punted ... It was decidedly
less than elegant but I just wanted to measure the impact on performance
to see if it was worth the effort ...
" srwi %[count],%[len],0x2;"
" beq- 2f;"
"1: mtpid %[pid];sync;"
" andi. %[tmp],%[count],3;"
" beq 111f;"
" andi. %[tmp],%[count],2;"
" beq 110f;"
" andi. %[tmp],%[count],1;"
" beq 101f;"
" b 100f;"
"111:lwz %[tmp4],12(%[kaddr]);"
"110:lwz %[tmp3],8(%[kaddr]);"
"101:lwz %[tmp2],4(%[kaddr]);"
"100:lwz %[tmp1],0(%[kaddr]);"
" sync; isync;"
" mtpid %[ctx]; sync;"
" andi. %[tmp],%[count],3;"
" beq 211f;"
" andi. %[tmp],%[count],2;"
" beq 210f;"
" andi. %[tmp],%[count],1;"
" beq 201f;"
" b 200f;"
"211:stw %[tmp4],12(%[udaddr]);"
"210:stw %[tmp3],8(%[udaddr]);"
"201:stw %[tmp2],4(%[udaddr]);"
"200:stw %[tmp1],0(%[udaddr]);"
" subfic %[count], %[count], %[tmp];"
" bne 1b;"