tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

-falign-functions=16 for i386/amd64



Hi,

I propose to set -falign-functions=16 to kernels
of i386/amd64 to reduce performance fluctuations
by small, unrelated changes.

[Background]

I noticed that performance of IP forwarding had
been degraded by 10% between Aug. 1 and Aug. 16.
Bisecting commits between them points out that
performance degradations happened by several
commits and unfortunately the commits aren't
related to performance of IP forwarding; for
example a change to ip6flow.

I and knakahara investigated how these
degradations happened and concluded that they
are because of changes of the start of functions
(alignment of function codes), which probably
affects CPU cache hits. (Actually this is just
our guess because we don't have a way to know
cache hit/miss ratios for now...)

[How -falign-functions=16 helps?]

Currently the start of functions of kernels of
i386/amd64 is unaligned, i.e., functions can
start at any bytes depending on leading objects
linked to the kernel. If the size of leading
objects has been changed, starts of all following
functions also change.

You can see how function alignments are organized
by nm -n netbsd or just seeing symbol files
generated in releasedir.

If you specify -falign-functions=16 to COPTS in
your kernel config, you can align functions by
16 bytes. By doing so, addresses of the start of
all functions always become 0xXXXXXXX0 for i386
0xffffffffXXXXXXX0 for amd64. The alignment makes
sure that functions don't affect by other
unrelated code changes.

[Why not aligned in the first place?]

It seems because of -mtune=nocona that is specified
in bsd.own.mk. -mtune=generic provides functions
aligned by 16 bytes, but provides poorer performance
than -mtune=nocona, so I don't propose this kind of
changes.

[-falign-functions=16 solves the issue completely?]

No. It seems there remains some other cause(s) that
provide performance fluctuations. Nonetheless,
setting -falign-functions=16 reduces fluctuations.

[The point of the proposal]

The aim of the proposal isn't to provide good
performance by aligning functions of a kernel,
but to reduce performance fluctuations by small,
unrelated changes. Such behavior makes it
difficult to measure small overhead of a change
because we cannot distinguish a given performance
change comes from either the real change or
function alignment changes.


Any suggestions or comments?

Adding -falign-functions=16 is one solution and
there may be a better way to the goal. And also
I'm not sure where we should add such option.

Thanks,
  ozaki-r


Home | Main Index | Thread Index | Old Index