Subject: Re: Port of NetBSD to XScale
To: Charles M. Hannum <root@ihack.net>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm32
Date: 03/29/2001 11:08:13
> 
> On Thu, Mar 29, 2001 at 09:17:52AM +0100, Chris Gilbert wrote:
> > 
> > Branching looks to be worse than ever at 4 cycle for a branch miss, or 0 if 
> > it's predicted by the branch prediction buffer, it doesn't see the standard 
> > MOV PC, LR to return method, I suspect that doing B LR will help it there.
> 
> Er, are you saying `mov pc, lr' always causes a 4-cycle stall?  That
> would an *amazing* f*ck*p.  Wow.
> 

According to the documentation I have, Xscale only predicts B and BL 
instructions, both of which only have pc-relative invariant offsets.  Any 
mis-predicted (or unpredicted) branch takes at least 5 cycles to issue (8 
if the value has to come from memory). [XScale Developers Manual, Table 
14-4]

So I don't think there are any coding tricks to speed this up, other than 
to avoid code like

ldr	pc, [addr]

when it can be reasonably split into 

ldr	reg, [addr]
<other instructions>
mov	pc, reg

which can save a couple of cycles.