[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
On Wed, Jan 28, 2009 at 09:43:59PM -0500, Allen Briggs wrote:
> On Thu, Jan 29, 2009 at 11:26:08AM +0900, Masao Uebayashi wrote:
> > I've considered this. We're really need to move toward modular and
> > stable ABI. OTOH we may need tricks to run programs faster on slow
> > computers (older computers, embedded low-power processors, etc.).
> > It'd be a good compromisation to make important APIs a function by
> > default, while preparing a way to make them "optimized" for speed
> > (inline, less indirection, etc.). In the "optimized" case, ABI is not
> > kept. So users can choose either ABI or speed.
> And it's not quite that simple and straightforward in some ways.
> For vax (I think) inline makes a lot of sense because there's a
> significant function call overhead. For ARM (and others?), inlining
> can be bad in some cases because it increases the code size, which
> can increase the memory footprint of the code and the number of
> instruction cache misses. In that case, functions (and non-unrolled
> loops) *can* actually be better.
For non-superscaler cpus and cpus without significant (or any)
instruction cache, inlining and loop unrolling are probably gains
(if you can afford the code space).
So on a vax or 68xxx inlining and unrolling are probably wins.
On modern cpus with large caches, the ability to execute multiple
instructions in parallel, and memory speeds that are much lower than
execution speed, things are horribly different.
The following (at least) make a difference
- loop constructs can often be performed in parallel with the
loop body. With care this can mean that loop unrolling is pointless.
- instruction prefetch and decode will continue through an unconditional
jump/call (and quite possibly return) without a pipeline stall.
So subroutine call cost is minimal - apart from argument stacking.
- inlining and unrolling both reduce the likelyhood of code being
in the cache (either from an earlier call to the same code, or
simply because the additional code has displaced something that
will be needed again).
- function calls could easily find the code already in the cache
from a call somewhere else - particulary true for things like
mutex code which is called very often.
On the downside, inlining can make a function into a 'leaf' which
typically gives the compiler many more registers to play with.
My 'gut feeling' is that it isn't worth inlining anything that
is likely to be longer than the call sequence.
I remember a problem with VN_RELE() being a #define, it looked quite
simple - but by the time the lock and spl calls had also been inlined
it was gross - and most of the calls were in error paths...
David Laight: david%l8s.co.uk@localhost
Main Index |
Thread Index |