Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

gcc stack alignment

Currently gcc for i386 in NetBSD is configured to conform to the historic
SYSV ABI that only requires the stack pointer to be 4-byte aligned.

However code compiled for Linux might expect a 16 byte aligned stack.
While we cannot be guarantee that the alignment will be maintained,
matching the Linux behaviour will let some code in pkgsrc run without
patches - and some of it is difficult to patch (as well as being tedious).

The following patch (and the equivalent one for the amd64 compiler)
reverts the change made just before NetBSD 6.0 was branched, but
defaults to setting the flag to realign the stack if anything
requiring 16b-byte alignemnt is allocated on-stack.
This will probably fix things for the programs that failed in
NetBSD 5.0.

There is also a big comment!


Index: netbsd-elf.h
RCS file: /cvsroot/src/external/gpl3/gcc/dist/gcc/config/i386/netbsd-elf.h,v
retrieving revision 1.3
diff -u -p -r1.3 netbsd-elf.h
--- netbsd-elf.h        14 Sep 2012 13:00:01 -0000      1.3
+++ netbsd-elf.h        30 Dec 2012 20:19:59 -0000
@@ -127,6 +127,66 @@ along with GCC; see the file COPYING3.  
 #define X87_ENABLE_ARITH(MODE) \
   (flag_excess_precision == EXCESS_PRECISION_FAST || (MODE) == DFmode)
-/* Preserve i386 psABI  */
+/* The i386 ABI (system V) only requires a 4-byte aligned stack.
+ * gcc unilaterally decided that having a 16-byte aligned stack
+ * was 'better' since SSE (etc) instructions require it.
+ *
+ * gcc will also align 8-byte locals (double, long long etc) on 8-byte
+ * stack addresses (assuming the input stack is aligned).
+ * As far as I (dsl) know, the x86 processor never needs 8-byte alignment
+ * (certainly not for any normal instructions), however there are performance
+ * benefits for aligning 8-byte items on 8-byte boundaries.
+ *
+ * If any on-stack data actually needs a greater alignment than gcc expects
+ * the stack to have, it will always add instruction(s) to align the stack.
+ * Items requiring 8-byte alignment (usually) don't trigger it.
+ * This sequence is only trully horrid if alloca() is also used.
+ *
+ * The kernel 16-byte aligns the stack on entry to main() (and threads) so
+ * that the alignment is usually maintained (if you search about this,
+ * you'll find that linux systems don't manage to maintain the alignment!).
+ * This is sort of 'best effort' to maintain alignment.
+ *
+ * Code compiled by other compilers and asm stubs (or GIT) might not align
+ * maintain the stack aligment. This causes 'random' SIGSEGV/SIGBUS.
+ *
+ * There are several options:
+ *
+ * 0) Change the ABI to require the stack be 16-byte aligned, and fix
+ * all the code that changes the stack alignment.
+ * This breaks everything compiled with old versions of gcc that might
+ * call into code compiled with the new ABI. There are also some other
+ * places where the stack can get misaligned.
+ * This just leads to random alignment faults.
+ *
+ * (4-byte) to tell gcc not to maintain the stack alignment, however that
+ * seems to cause some internal compiler asserts because it (incorrectly)
+ * expects MMX registers to need 8-byte alignment.
+ *
+ * (8-byte) to tell gcc to maintain (and assume) 8 byte alignment.
+ * The stack might only be 4-byte aligned, but that won't cause any code
+ * to fault - since the cpu will always handle the misaligned transfers.
+ * However some code from pkgsrc might (assembler or GIT objects) expect
+ * 16-byte alignment and still fault.
+ *
+ * 3) #define STACK_REALIGN_DEFAULT 1 so that gcc maintains 16 byte alignment
+ * but will align the stack if anything actually requires 16 byte alignment.
+ * In this case, the majority of the time, the stack will be 16-byte
+ * aligned - so code that assumes that will work.
+ * If the stack is not 16-byte aligned, but gcc needs to place a 16-byte
+ * aligned item on stack, then code to align the stack will be added.
+ * 
+ * Neither (2) nor (3) will realign the stack if any data items are
+ * marked __attribute__((aligned(8))). However the cpu won't fault if
+ * they are misaligned, and the code is very unlikely to care.
+ * 
+ * Option (3) has the advantage that it is closest to the compiler code
+ * paths and object code paths/layout that Linux uses. These will be the
+ * most tested and assumed - so are least likely to be problematic.
+ * It does lead to slightly larger stack usage than any of the other options.
+ */

David Laight: david%l8s.co.uk@localhost

Home | Main Index | Thread Index | Old Index