Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Fix GCC's preferred stack alignment to match psABI



On Mon, Sep 17, 2012 at 12:28:52PM +0200, Joerg Sonnenberger wrote:
> On Sun, Sep 16, 2012 at 06:36:09PM +0100, David Laight wrote:
> > On Tue, Sep 11, 2012 at 10:22:48PM +0200, Joerg Sonnenberger wrote:
> > > Hi all,
> > > the attached patch consists of two parts. The first part ensures that
> > > GCC honours the SYSV ABI on i386 as used by NetBSD. The second part
> > > adjusts the alignment of stack variables, so that double and complex
> > > double as well as long long are not aligned to 64bit by default, if the
> > > stack alignment is smaller. This avoids triggering unnecessary stack
> > > reliagnments.
> > 
> > What was wrong with enabling -mstackrealign by default?
> > 
> > That leaves the efficiency of aligning the stack on 16 byte boundaries.
> > But adds explicit alignment of the stack only when 128bit alignment
> > is requested.
> 
> It is enabled by default, but doesn't work since the assumption of
> inbound 16 Bytes alignment doesn't match the ABI. There isn't a real
> intermediate option either, since the code can either take the penalty
> of realigning the stack or not. The patch as is defaults to not taking
> it and restores essentially the behavior of old GCC before Linux
> switched to the "GNU" ABI. The only case where it might actually be
> slower is if the CPU does provide a real penalty for loading a double
> from a misaligned dword (e.g. address is 4 mod 8). I doubt that a lot of
> code is affected by that, if at all. In that case, explicit realignment
> can still be requested.

It isn't enabled in the source I looked at, and specifying it on the
command line adds stack realignment if anything requires 16 byte alignment.
(As is done if 32byte alignment is requested.)

There are definitely real penalties for accessing 8 byte aligned items
on 4 byte boundaries - likely to affect real code.
Even 'rep stosd' is faster on 8 byte aligned addresses (on some cpu).

gcc goes to some lengths to align 8 bytes items (uint64 and double)
on 8 byte stack boundaries - rather than using the 4 byte alignment
they get within structures. Your patch throws this away at a later time.

If we set -mstackalign and align the stack in exec and for threads,
then the stack stays aligned through gcc calls - giving aligned
8 byte items, but any frames that need 16bit alignment get the stack
realigned (which involves using %ebp relative addressing for the args
and %esp relative addressing for the local (unless alloca() is also used)).

Non-gcc compiled functions might misalign the stack, but gcc code won't
mind that.

There might also be asm code lurking (maybe JIT in pkgsrc somewhere)
that assumes 16byte alignment. Fixing that will be a PITA.

Since there are no instructions that require 8 byte aligned data
there is no need ever realign the stack in that case.

I've also looked at the effect of also using (the builtin) alloca().
That carefully allocates aligned data (by allocating too much data),
and generates a wierd stack frame when the locals also require stack
alignment. You really don't want that special case at all.

        David

-- 
David Laight: david%l8s.co.uk@localhost


Home | Main Index | Thread Index | Old Index