NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-m68k/55556: [FIXED] amiga kernel freeze when compiled by GCC8



>Number:         55556
>Category:       port-m68k
>Synopsis:       [FIXED] amiga kernel freeze when compiled by GCC8
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-m68k-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Aug 09 01:00:00 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.69
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD a1200 9.99.69 NetBSD 9.99.69 (A1200.8hack) #13: Thu Aug  6 22:44:27 JST 2020  rin@latipes:/sys/arch/amiga/compile/A1200.8hack amiga
>Description:
amiga kernel compiled by GCC8 randomly freezes as reported for A1200
with 68060:

[1] http://mail-index.netbsd.org/port-amiga/2020/06/04/msg008130.html

as well as FS-UAE (WinUAE-derived emulator):

[2] http://mail-index.netbsd.org/port-amiga/2020/07/23/msg008159.html

When this kind of freezes occur, I cannot even enter DDB nor obtain crash
dump.

As suggested in [1], if kern_tc.o is replaced by that compiled by GCC7,
freeze is remarkably mitigated.

I've found that this is due to wrong setting in
external/gpl3/gcc/dist/gcc/config/m68k/netbsd-elf.h:

| /* Boundary (in *bits*) on which stack pointer should be aligned.
|    The m68k/SVR4 convention is to keep the stack pointer longword aligned.  */
|
| #undef STACK_BOUNDARY
| #define STACK_BOUNDARY 32
| #undef PREFERRED_STACK_BOUNDARY
| #define PREFERRED_STACK_BOUNDARY 32

Meanings of these macros are described in
external/gpl3/gcc/dist/gcc/doc/gcc-int.info:

| -- Macro: STACK_BOUNDARY
|     Define this macro to the minimum alignment enforced by hardware for
|     the stack pointer on this machine.  The definition is a C
|     expression for the desired alignment (measured in bits).  This
|     value is used as a default if 'PREFERRED_STACK_BOUNDARY' is not
|     defined.  On most machines, this should be the same as
|     'PARM_BOUNDARY'.
|
| -- Macro: PREFERRED_STACK_BOUNDARY
|     Define this macro if you wish to preserve a certain alignment for
|     the stack pointer, greater than what the hardware enforces.  The
|     definition is a C expression for the desired alignment (measured in
|     bits).  This macro must evaluate to a value equal to or larger than
|     'STACK_BOUNDARY'.

For m68k, stack is required to be aligned to at least 2-byte boundary by
architecture [3]. Whereas System V ABI [4] demands it to be aligned to
4-byte boundary.

[3] Motorola M68000 Family Programmer?s Reference Manual, etc.
[4] AT&T System V Application Binary Interface Motorola 68000 Processor
    Family Supplement

Therefore, the correct settings should be
 
| /* Boundary (in *bits*) on which stack pointer should be aligned.
|    The m68k/SVR4 convention is to keep the stack pointer longword aligned.  */
|
| #if 0 /* default to 16 */
| #undef STACK_BOUNDARY
| #define STACK_BOUNDARY 32
| #endif
| #undef PREFERRED_STACK_BOUNDARY
| #define PREFERRED_STACK_BOUNDARY 32

This coincides with how external/gpl3/gcc/dist/gcc/config/m68k/linux.h
defines.

With these setting, amiga kernel works just fine as far as I can see.
Furthermore, for amiga, mac68k, and sun3,

(1)   Kernel compiled by patched GCC8 works with
(1-a) userland built by GCC7 and non-modified GCC8, and
(1-b) userland built by patched GCC8.

(2)   Userland binaries compiled by GCC7 and non-modified GCC8 work fine
      with kernel and base libraries built by patched GCC8.

(3)   There's no regression observed for tests/kernel, tests/lib/libc/sys,
      and tests/lib/libc/gen.

Now, the question is why non-modified GCC8 fails for amiga kernel. This is
because GCC8 more wisely allocates variables on stack than GCC7 does, by
using STACK_BOUNDARY.

For example, the code below is taken from umoddi3.o compiled by non-modified
GCC8, which allocates 8-byte object on stack:

|  10:	240e           	movel %fp,%d2
|  12:	5b82           	subql #5,%d2	| char *p = fp - 5
|  14:	76f8           	moveq #-8,%d3
|  16:	c483           	andl %d3,%d2	| p &= ~7

On the other hand, the following is the same code segment compiled by GCC7
or patched GCC8:

|  10:	74f7           	moveq #-9,%d2
|  12:	d48e           	addl %fp,%d2	| char *p = fp - 9
|  14:	76f8           	moveq #-8,%d3
|  16:	c483           	andl %d3,%d2	| p &= ~7

The former code works only when the frame pointer, i.e., stack boundary is
aligned to 4-byte boundary. Otherwise, that code corrupts stack frame.

Usually, code confirming to System V ABI aligns stack to 4-byte boundary.
However, since the minimum alignment required by hardware is only 2 bytes,
stack is not aligned by 4-byte boundary in an instant in general; there's
a plenty of idiom below in kernel:

| movew	%d0,%sp@-	| push 2-byte word to stack
| clrw	%sp@-		| align stack to 4-byte boundary

If an interrupt occurs between movew and clrw in this example, the stack
is not aligned to 4-byte boundary in the interrupt handler.

If stack is not aligned to 4-byte boundary, stack frame is corrupted when
allocating 8-byte object on it as mentioned above. This is the cause of
freeze; tc_windup() in kern_tc.c called by hardclock() via tc_ticktock()
uses 8-byte objects here and there for 64-bit time_t.

Actually, by inserting assertion

|#ifdef amiga
|        uint32_t sp;
|
|        __asm volatile("movl %%sp,%0" : "=d"(sp));
|        if ((sp & 3) != 0)
|                panic("hardclock");
|#endif

to hardclock(), it fires as expected.

Since the priority of timer interrupt is highest (except for NMI) for
amiga, it can cause for almost every moment in kernel, which results in
the freeze.

I guess that similar failures occur for userland with signal. Therefore,
STACK_BOUNDARY should be set to 16, not 32 for m68k.

Note that by changing STACK_BOUNDARY from 32 to 16 also fixes sun2 kernel.
With non-modified GCC8, sun2 kernel crashes in strange ways during the
early boot stages. However, with this change it boots singleuser. Also, by
adding -fno-omit-frame-pointer, it boots multiuser (I haven't figured out
why).
>How-To-Repeat:
Described above.
>Fix:
Described above.



Home | Main Index | Thread Index | Old Index