Subject: Re: Port of NetBSD to XScale
To: Ignatios Souvatzis <is@netbsd.org>
From: Reinoud Zandijk <imago@kabel065011.kabel.utwente.nl>
List: port-arm32
Date: 03/30/2001 14:35:22
Hiya all,

On Thu, 29 Mar 2001, Ignatios Souvatzis wrote:
> > > > According to the documentation I have, Xscale only predicts B and BL
> > > > instructions, both of which only have pc-relative invariant offsets.  Any
> > > > mis-predicted (or unpredicted) branch takes at least 5 cycles to issue (8
> > > > if the value has to come from memory). [XScale Developers Manual, Table
> > > > 14-4]
> > >
> > > So a function return always takes 5 clock cycles??  Was this thing
> > > developed by the same group that did the P4, perchance??
> >
> > Shit, and another 5 cycles for every PIC function call.  This is gonna
> > suck a lot.
>
> Maybe they want to make sure it is only used for embedded applications, not
> for general OSes.

Well the disadvantages are not as big as we are suggesting here ... if one
looks at the time consumption of code the loops and other algorithms are
the main users; these are very favoured with the branch prediction ... no
branch delays/pipeline flushing. This means that the 5 clock cycles one
looses for a PIC function call is easily won back by the first (biggish)
loop we encounter since it is 5+998*0+5 = 10 instead of 1000*2 = 2000 ...
the loss of speed is thus not as big as we suggest here.

One easy (big?) optimalisation would be to compile the kernel and other
stuff with -O3 since that inlines small functions and such and thus
reduces the branch overhead significantly....

Cheers,
Reinoud