Subject: Re: 4xx copyin/copyout [Was: CVS commit: src/sys/arch/powerpc/ibm4xx]
To: Simon Burge <simonb@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: port-powerpc
Date: 11/28/2007 12:03:46
On Wed, Nov 28, 2007 at 02:22:13AM +1100, Simon Burge wrote:
> Juergen Hannken-Illjes wrote:
> 
> > On Thu, Nov 22, 2007 at 01:33:08PM +0000, Herb Peyerl wrote:
> > > 
> > > Module Name:	src
> > > Committed By:	hpeyerl
> > > Date:		Thu Nov 22 13:33:08 UTC 2007
> > > 
> > > Modified Files:
> > > 	src/sys/arch/powerpc/ibm4xx: trap.c
> > > 
> > > Log Message:
> > > Optimize copyin/copyout to transfer as many words as possible before doing
> > > residual bytes. This improves small transfers. As a result, we can avoid
> > > doing bigcopyin/bigcopyout until len>1024 instead of len>256.
> > > 
> > > Reviewed by: simonb.
> > > 
> > > (everybody run, Herb's in the kernel again).
> > 
> > This breaks with alignment trap on the 1st copyout for evbppc/explora:
> > 
> > 	kaddr=0x2fda88 udaddr=0xfffebff5 len=11
> > 
> > Trap is at this line:
> > 
> > 	"   stw %[tmp],0(%[udaddr]);"       /* Store user word */
> 
> The 405 core manual says that the only intructions that can issue
> alignment traps are dcbz, dcread, lwarx and stwcx.  I found a manual
> for the 403CGX and it says "All data operands referenced by the Storage
> Reference instructions (loads/stores) must be aligned on a corresponding
> operand-size boundary." so that explains why I did see any issues when I
> tested Herb's patches on a 405 Walnut.
> 
> I've just tried using lswi (Load String Word Immediate) in place if
> lwz (and similar for the stores) to fix this without any significant
> rewriting, and on a Walnut it gets the same performance as just using
> lwz/stw.  Does the trailing patch work for you on the Explora?
> 
> If this works, we can possibly also look at unrolling the word loop a
> bit since the load/store string instructions can do up to 32-bytes per
> instruction.  Does anyone know how to request a number of consecutive
> registers with gcc asm constraints?  If not, it might be easier to break
> these two assembly fragments out to their own .S file...
> 
> If this patch doesn't fix the Explora, we can just add a alignment check
> (if on a 403) and skip the word-at-a-time loop, although the trailing
> loop will need to be updated to not just use "len % 4" bytes.

Patch works fine.  Explora comes up multi-user without problems.

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)