Subject: Re: DIAGNOSTIC in -current
To: Chuck Silvers <>
From: David Laight <>
List: current-users
Date: 10/02/2006 19:57:26
On Sat, Sep 30, 2006 at 04:34:24PM -0700, Chuck Silvers wrote:
> macro
> Simple syscall: 0.3796 microseconds
> Simple read: 1.4517 microseconds
> Simple write: 1.7143 microseconds
> Simple stat: 6.0924 microseconds
> Simple fstat: 0.8437 microseconds
> func
> Simple syscall: 0.3796 microseconds
> Simple read: 1.4891 microseconds
> Simple write: 1.5419 microseconds
> Simple stat: 6.4692 microseconds
> Simple fstat: 0.8447 microseconds

This shows the clear benefits behind function inlining :-)
I sometimes wonder whether 'loop unrolling' has similar effects.
Not to mention the way (certain versions of) gcc align target branches.
(Looked at one at work on friday where it had added 11 bytes of pad
prior to an 8 byte code snippet that ended in a branch.  Since the
pad followed the 'rts' instruction i suspect that code path just
causes an extra memory read + cache displacement.)

In real life things will be different as well. For the benchmarks above
the entire code loop is likely to remain in the cache between iterations.
In real life that is less likely, but it is quite possible that two different
code paths will both require malloc() - and the second will find it cached.

Of course, testing this is all too hard....


David Laight: