NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
>Number: 57661
>Category: port-amd64
>Synopsis: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-amd64-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Oct 15 12:15:00 +0000 2023
>Originator: Harold Gutch
>Release: NetBSD current
>Organization:
>Environment:
NetBSD 10.99.10 NetBSD 10.99.10 (GENERIC) #0: Thu Oct 12 23:51:05 UTC 2023 mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Booting a current kernel on a Xeon Silver 4416+ CPU inside KVM/Qemu yields:
[ 1.5179371] uvm_fault(0xffffffff81911220, 0xffffac805182a000, 2) -> e
[ 1.5179371] fatal page fault in supervisor mode
[ 1.5179371] trap type 6 code 0x2 rip 0xffffffff80fdcfdc cs 0x8 rflags 0x10206 cr2 0xffffac805182a000 ilevel 0 rsp 0xffffffff81d56d38
[ 1.5280253] curlwp 0xffffffff8188b9c0 pid 0.0 lowest kstack 0xffffffff81d512c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:memset+0x2c: repe stosq %es:(%rdi)
memset() at netbsd:memset+0x2c
lwp_create() at netbsd:lwp_create+0x325
fork1() at netbsd:fork1+0x43b
main()at netbsd:main+0x47a
The stack trace is a bit of a red herring, I traced down the memset to line 343 of src/sys/arch/x86/x86/fpu.c, so we actually have:
lwp_create() -> uvm_lwp_fork() -> cpu_lwp_fork() -> fpu_lwp_fork() -> memset()
Adding
printf("DEBUG: sizeof(pcb2->pcb_savefpu)==%ld\n", sizeof(pcb2->pcb_savefpu));
printf("DEBUG: x86_fpu_save_size==%d\n", x86_fpu_save_size);
before the memset() call prints
[ 1.8432366] DEBUG: sizeof(pcb2->pcb_savefpu)==576
[ 1.8432366] DEBUG: x86_fpu_save_size==11008
Changing the VM's CPU to Sandy Bridge prints
[ 1.8897648] DEBUG: sizeof(pcb2->pcb_savefpu)==576
[ 1.8897648] DEBUG: x86_fpu_save_size==832
... which also *seems* odd, but the machine works then. But the comment in line 80 of src/sys/arch/amd64/include/pcb.h appears to suggest that pcb_savefpu goes until the end of the page, so I guess the 832 vs 576 bytes discrepancy falls under "yes... but that's OK". But with x86_fpu_save_size==11008 we are writing far beyond the end of the page.
>How-To-Repeat:
Boot NetBSD on Linux in Qemu with "-cpu host" on a host with a Xeon Silver 4416+ CPU.
Possibly alternatively: Boot NetBSD natively on a machine with such a CPU (untested as of now, I don't have such a machine in testing state available right now)
>Fix:
In Qemu, select a different CPU type without AVX-512, e.g., Sandy Bridge.
Home |
Main Index |
Thread Index |
Old Index