Subject: Re: 4xx copyin/copyout [Was: CVS commit: src/sys/arch/powerpc/ibm4xx]
To: Simon Burge <simonb@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: source-changes
Date: 11/28/2007 12:03:46
On Wed, Nov 28, 2007 at 02:22:13AM +1100, Simon Burge wrote:
> Juergen Hannken-Illjes wrote:
>
> > On Thu, Nov 22, 2007 at 01:33:08PM +0000, Herb Peyerl wrote:
> > >
> > > Module Name: src
> > > Committed By: hpeyerl
> > > Date: Thu Nov 22 13:33:08 UTC 2007
> > >
> > > Modified Files:
> > > src/sys/arch/powerpc/ibm4xx: trap.c
> > >
> > > Log Message:
> > > Optimize copyin/copyout to transfer as many words as possible before doing
> > > residual bytes. This improves small transfers. As a result, we can avoid
> > > doing bigcopyin/bigcopyout until len>1024 instead of len>256.
> > >
> > > Reviewed by: simonb.
> > >
> > > (everybody run, Herb's in the kernel again).
> >
> > This breaks with alignment trap on the 1st copyout for evbppc/explora:
> >
> > kaddr=0x2fda88 udaddr=0xfffebff5 len=11
> >
> > Trap is at this line:
> >
> > " stw %[tmp],0(%[udaddr]);" /* Store user word */
>
> The 405 core manual says that the only intructions that can issue
> alignment traps are dcbz, dcread, lwarx and stwcx. I found a manual
> for the 403CGX and it says "All data operands referenced by the Storage
> Reference instructions (loads/stores) must be aligned on a corresponding
> operand-size boundary." so that explains why I did see any issues when I
> tested Herb's patches on a 405 Walnut.
>
> I've just tried using lswi (Load String Word Immediate) in place if
> lwz (and similar for the stores) to fix this without any significant
> rewriting, and on a Walnut it gets the same performance as just using
> lwz/stw. Does the trailing patch work for you on the Explora?
>
> If this works, we can possibly also look at unrolling the word loop a
> bit since the load/store string instructions can do up to 32-bytes per
> instruction. Does anyone know how to request a number of consecutive
> registers with gcc asm constraints? If not, it might be easier to break
> these two assembly fragments out to their own .S file...
>
> If this patch doesn't fix the Explora, we can just add a alignment check
> (if on a 403) and skip the word-at-a-time loop, although the trailing
> loop will need to be updated to not just use "len % 4" bytes.
Patch works fine. Explora comes up multi-user without problems.
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)