Subject: Re: RFC: Change SWI number base?
To: None <Richard.Earnshaw@arm.com>
From: David Laight <David.Laight@btinternet.com>
List: port-arm
Date: 01/09/2002 12:15:16
> > The 'pc' value when the 'bx pc' is done MUST be a multiple of 4.
> 
> > Apparantly (inspite of what the ARM ARM may have said) some cpus
> > don't ignore bit 1 of the pc when doing pc relative loads - so
> > find all your constants rotated by 16 bits :-)
> 
> Eh?  Can you cite examples?

The following has a 50% chance of success (on some cpus):

    .code16
    ...
go_32:
    bx pc
    nop
    .code32
    .balign 4
code_32:
    ldr r0,=0x12345    /* [pc,#nn] */

if go_32 is at address 4n+2, code_32 will be 4n+8, the 'pc' when the
'bx pc' is execute will be '.+4' or 4n+6. In 32 bit mode bits 0 and 1
of the pc are ignored when fetching instructions, so the first fetch
is the the thumb 'nop' and the pad! - hopefully a nop.  But the pc used
in the pc-relative load will have bit 1 set.....
> 
> > If (many) of the syscall hooks are in one file, making the final sequence:
> > 
> >     swi nnn
> >     bxcc lt
> >     b __go_cerror
> > 
> > __go_cerror:
> >     ldr r12,=__cerror
> >     bx r12
> > 
> > saves a few bytes and makes each hook 16 bytes - so they fit niceley
> > into cache lines. 
> 
> But costs an extra non-predictable branch (very expensive on XScale).  I'm 
> not suggesting that we should look at something like this, but the cost 
> has to be borne in mind.

These sort of costs are very difficult to quantify!
However I don't think that any optimisation of the syscall fail path
will be a gain if it lengthens to sucess path.
> 
> > Now work out the optimal order for the hooks, then
> > get the .balign 32 to work (it doesn't in the arm a.out build I've used).
> 
> a.out object files on ARM only maintain the sections to 4-byte alignment, 
> so will ignore attempts to force greater alignment.  The linker simply 
> concatenates each similar section ensuring that they start on a 4-byte 
> boundary.

What I guessed - made the 32byte alignment of the code tables for the java
byte code interpreter and the integer divide routine somewhat sub-optimal!

    David