tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: -falign-functions=16 for i386/amd64



On Mon, Aug 29, 2016 at 4:43 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
> Hi,
>
> I propose to set -falign-functions=16 to kernels
> of i386/amd64 to reduce performance fluctuations
> by small, unrelated changes.
>
> [Background]
>
> I noticed that performance of IP forwarding had
> been degraded by 10% between Aug. 1 and Aug. 16.
> Bisecting commits between them points out that
> performance degradations happened by several
> commits and unfortunately the commits aren't
> related to performance of IP forwarding; for
> example a change to ip6flow.
>
> I and knakahara investigated how these
> degradations happened and concluded that they
> are because of changes of the start of functions
> (alignment of function codes), which probably
> affects CPU cache hits. (Actually this is just
> our guess because we don't have a way to know
> cache hit/miss ratios for now...)
>
> [How -falign-functions=16 helps?]
>
> Currently the start of functions of kernels of
> i386/amd64 is unaligned, i.e., functions can
> start at any bytes depending on leading objects
> linked to the kernel. If the size of leading
> objects has been changed, starts of all following
> functions also change.
>
> You can see how function alignments are organized
> by nm -n netbsd or just seeing symbol files
> generated in releasedir.
>
> If you specify -falign-functions=16 to COPTS in
> your kernel config, you can align functions by
> 16 bytes. By doing so, addresses of the start of
> all functions always become 0xXXXXXXX0 for i386
> 0xffffffffXXXXXXX0 for amd64. The alignment makes
> sure that functions don't affect by other
> unrelated code changes.
>
> [Why not aligned in the first place?]
>
> It seems because of -mtune=nocona that is specified
> in bsd.own.mk. -mtune=generic provides functions
> aligned by 16 bytes, but provides poorer performance
> than -mtune=nocona, so I don't propose this kind of
> changes.
>
> [-falign-functions=16 solves the issue completely?]
>
> No. It seems there remains some other cause(s) that
> provide performance fluctuations. Nonetheless,
> setting -falign-functions=16 reduces fluctuations.
>
> [The point of the proposal]
>
> The aim of the proposal isn't to provide good
> performance by aligning functions of a kernel,
> but to reduce performance fluctuations by small,
> unrelated changes. Such behavior makes it
> difficult to measure small overhead of a change
> because we cannot distinguish a given performance
> change comes from either the real change or
> function alignment changes.

[Where 16 comes from?]

From old Intel Optimization Manual (for Pen II and III).
For recent processors 32 may be better, but for stock
kernels (such as GENERIC) 16 is probably better (for old
machines). (And if we want to optimize really we should
use -march or -mtune instead.)

Another reason is that stock kernels of other OSes
(FreeBSD, OpenBSD and Linux) look employing 16 byte
alignment.

  ozaki-r

>
>
> Any suggestions or comments?
>
> Adding -falign-functions=16 is one solution and
> there may be a better way to the goal. And also
> I'm not sure where we should add such option.
>
> Thanks,
>   ozaki-r


Home | Main Index | Thread Index | Old Index