[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
On Thu, Jan 29, 2009 at 09:03:53PM +0000, David Laight wrote:
> For non-superscaler cpus and cpus without significant (or any)
> instruction cache, inlining and loop unrolling are probably gains
> (if you can afford the code space).
> So on a vax or 68xxx inlining and unrolling are probably wins.
Be careful... at least 68060 is very much RISCy in this regards.
Besides being mildly superscalar, the 68060 has a branch target
cache and can hide even a conditional branch completely if mostly
taken in one direction. (I verified this while implementing the
new delay loop). Given the tiny instruction cache, I'd prefer
calls, as long as the overall code size is smaller.
For really tiny inlines where the compiler can make use of overall
optimizations within the caller, this might be different.
The confusion is even worse:
The tiny primary (and only) physically-addressed instruction cache
of the 68060 makes kernel-trapping FPU instructions, emulated by
the kernel trap emulation library (necessary for FSIN & friends,
like on the 68040 I think), *faster* than using the userland version
of the emulation library in a 68060-specific libm. I have the
numbers somewhere. Well, I had them - carefully collected after
one week of evening work to build the library, with much cursing
after reading the results.
Main Index |
Thread Index |