port-arm32: Re: Port of NetBSD to XScale

Subject: Re: Port of NetBSD to XScale
To: Reinoud Zandijk <imago@kabel065011.kabel.utwente.nl>
From: Chris Gilbert <chris@paradox.demon.co.uk>
List: port-arm32
Date: 03/29/2001 09:17:52

On Wednesday 28 March 2001  1:46 pm, Reinoud Zandijk wrote:
> Hi John,
>
> On Tue, 27 Mar 2001, John Clark wrote:
> > I'm interested if anyone has begun a port to the XScale, Intel's rename
> > of the Strong Arm 2 processor.
> >
> > I have a just beginning to work linux port, but I'd like a more BSD
> > style derivative for potential future projects.
>
> Well i'm also interested in porting NetBSD to a Xscale machine but that
> machine hasnt been out yet :( ... i think i'll be more working on
> supporting this machine and to release its full potentional rather than on
> the toolchains though for i would first have to dig into that :)) ...

There will need to be some work on the toolchain to get performance up.  The 
pipeline on this thing is 7 long, this means that ldr's cause a longer stall 
on the code (SWP is lethal on it as it's about 4 cycles long now, but that 
said the only major use for SWP is in mutexes and locks)

Branching looks to be worse than ever at 4 cycle for a branch miss, or 0 if 
it's predicted by the branch prediction buffer, it doesn't see the standard 
MOV PC, LR to return method, I suspect that doing B LR will help it there.

Infact looking at the timing of things this:
LDR R14, [R2] (takes 1 issue cycle and stall 3 for the data result)
B R14	    (takes 1 issue cycle if predicted, or 5 if not)
could be faster than:
LDR PC, R2  (takes minumum of 8 issue cycles)
if the prediction buffer has done this before, I'd also assume that it will 
look at the B and go that's always going to get taken (but the docs don't 
seem to state this)

some potentially useful things look to be the ability to lock memory into the 
instruction cache, eg we could lock the 0 page vectors into the cache.

Of course the xscale looks to be clocked at near twice the speed of the SA, 
so it probably will seem slightly faster :)

Cheers,
Chris