NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu



>Number:         57661
>Category:       port-amd64
>Synopsis:       Crash when booting on Xeon Silver 4416+ in KVM/Qemu
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-amd64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 15 12:15:00 +0000 2023
>Originator:     Harold Gutch
>Release:        NetBSD current
>Organization:
>Environment:
NetBSD  10.99.10 NetBSD 10.99.10 (GENERIC) #0: Thu Oct 12 23:51:05 UTC 2023  mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Booting a current kernel on a Xeon Silver 4416+ CPU inside KVM/Qemu yields:

[   1.5179371] uvm_fault(0xffffffff81911220, 0xffffac805182a000, 2) -> e
[   1.5179371] fatal page fault in supervisor mode
[   1.5179371] trap type 6 code 0x2 rip 0xffffffff80fdcfdc cs 0x8 rflags 0x10206 cr2 0xffffac805182a000 ilevel 0 rsp 0xffffffff81d56d38
[   1.5280253] curlwp 0xffffffff8188b9c0 pid 0.0 lowest kstack 0xffffffff81d512c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at  netbsd:memset+0x2c:     repe stosq      %es:(%rdi)
memset() at netbsd:memset+0x2c
lwp_create() at netbsd:lwp_create+0x325
fork1() at netbsd:fork1+0x43b
main()at netbsd:main+0x47a


The stack trace is a bit of a red herring, I traced down the memset to line 343 of src/sys/arch/x86/x86/fpu.c, so we actually have:
lwp_create() -> uvm_lwp_fork() -> cpu_lwp_fork() -> fpu_lwp_fork() -> memset()

Adding
  printf("DEBUG: sizeof(pcb2->pcb_savefpu)==%ld\n", sizeof(pcb2->pcb_savefpu));
  printf("DEBUG: x86_fpu_save_size==%d\n", x86_fpu_save_size);

before the memset() call prints

  [   1.8432366] DEBUG: sizeof(pcb2->pcb_savefpu)==576
  [   1.8432366] DEBUG: x86_fpu_save_size==11008

Changing the VM's CPU to Sandy Bridge prints

  [  1.8897648] DEBUG: sizeof(pcb2->pcb_savefpu)==576
  [  1.8897648] DEBUG: x86_fpu_save_size==832

... which also *seems* odd, but the machine works then.  But the comment in line 80 of src/sys/arch/amd64/include/pcb.h appears to suggest that pcb_savefpu goes until the end of the page, so I guess the 832 vs 576 bytes discrepancy falls under "yes... but that's OK".  But with x86_fpu_save_size==11008 we are writing far beyond the end of the page.
>How-To-Repeat:
Boot NetBSD on Linux in Qemu with "-cpu host" on a host with a Xeon Silver 4416+ CPU.

Possibly alternatively:  Boot NetBSD natively on a machine with such a CPU (untested as of now, I don't have such a machine in testing state available right now)
>Fix:
In Qemu, select a different CPU type without AVX-512, e.g., Sandy Bridge.



Home | Main Index | Thread Index | Old Index