--- Begin Message ---
> Date: Fri, 11 Apr 2025 08:22:59 +0000
> From: Emmanuel Dreyfus <manu%netbsd.org@localhost>
>
> I tried to boot NetBSD-current (i386 and amd6), BIOS and UEFI modes) on a
> Dell server, it always panic on a "fatal page fault in supervisor mode"
> early during the boot. The backtrace says we went there through
> main/fork1/lwp_create/memset
>
> A screenshot is available there:
> https://dl.espci.fr/ticket/2a34048a0a9749c68cfb0dc380cc2259
>
> Any idea how to debug that? I expected NetBSD to lack some hardware
> support on this recent machine, not to be unable to reach single user
> mode.
This looks like the same issue as:
PR kern/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
https://gnats.netbsd.org/57661
Some additional diagnostics from the PR:
> The stack trace is a bit of a red herring, I traced down the memset to line 343 of src/sys/arch/x86/x86/fpu.c, so we actually have:
> lwp_create() -> uvm_lwp_fork() -> cpu_lwp_fork() -> fpu_lwp_fork() -> memset()
>
> Adding
> printf("DEBUG: sizeof(pcb2->pcb_savefpu)==%ld\n", sizeof(pcb2->pcb_savefpu));
> printf("DEBUG: x86_fpu_save_size==%d\n", x86_fpu_save_size);
>
> before the memset() call prints
>
> [ 1.8432366] DEBUG: sizeof(pcb2->pcb_savefpu)==576
> [ 1.8432366] DEBUG: x86_fpu_save_size==11008
>
> Changing the VM's CPU to Sandy Bridge prints
>
> [ 1.8897648] DEBUG: sizeof(pcb2->pcb_savefpu)==576
> [ 1.8897648] DEBUG: x86_fpu_save_size==832
So, the x86_fpu_save_size is larger than we can handle and we don't
report the problem gracefully either -- my initial attempt to put a
more obvious panic in x86/identcpu.c when we initially determine
x86_fpu_save_size hit the problem that it happens too early and sends
the machine into a reboot loop.
Unfortunately, simply expanding the space for pcb_savefpu is not
trivial:
80 /*
81 * IMPORTANT NOTE: this structure, including the variable-sized FPU state at
82 * the end, must fit within one page.
83 */
84 struct pcb {
...
103 union savefpu pcb_savefpu __aligned(64); /* floating point state */
104 /* **** DO NOT ADD ANYTHING HERE **** */
105 };
https://nxr.netbsd.org/xref/src/sys/arch/amd64/include/pcb.h?r=1.32#80
The top comment came from this commit:
> Module Name: src
> Committed By: maxv
> Date: Tue Mar 17 17:18:49 UTC 2020
>
> Modified Files:
> src/sys/arch/amd64/include: cpu.h param.h pcb.h types.h
> src/sys/arch/x86/x86: vm_machdep.c
>
> Log Message:
> Add a redzone between the pcb and the stack. Sent to port-amd64@.
https://mail-index.netbsd.org/source-changes/2020/03/17/msg115178.html
So, the immediate issue is that the memset in fpu_lwp_fork is hitting
the guard page. Perhaps resolving that is simply a matter of teaching
cpu_uarea_alloc to advance by round_page(offsetof(struct pcb,
pcb_savefpu) + x86_fpu_save_size) rather than by PAGE_SIZE when
deciding where to set the guard page:
364 void *
365 cpu_uarea_alloc(bool system)
366 {
367 vaddr_t base, va;
368 paddr_t pa;
369
370 base = uvm_km_alloc(kernel_map, USPACE + PAGE_SIZE, 0,
371 UVM_KMF_WIRED|UVM_KMF_WAITVA);
372
373 /* Page[1] = RedZone */
374 va = base + PAGE_SIZE;
375 if (!pmap_extract(pmap_kernel(), va, &pa)) {
376 panic("%s: impossible, Page[1] unmapped", __func__);
377 }
378 pmap_kremove(va, PAGE_SIZE);
379 uvm_pagefree(PHYS_TO_VM_PAGE(pa));
https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/vm_machdep.c?r=1.46#352
But the bottom comment, /* **** DO NOT ADD ANYTHING HERE **** */, is
much older:
> Module Name: src
> Committed By: dsl
> Date: Thu Feb 20 18:19:10 UTC 2014
>
> Modified Files:
> src/sys/arch/amd64/amd64: genassym.cf machdep.c
> src/sys/arch/amd64/include: pcb.h proc.h
> src/sys/arch/i386/i386: locore.S machdep.c
> src/sys/arch/i386/include: pcb.h proc.h
> src/sys/arch/x86/x86: vm_machdep.c
> src/sys/sys: param.h
>
> Log Message:
> Move the amd64 and i386 pcb to the bottom of the uarea, and move the
> kernel stack to the top.
> Change the pcb layouts so that fpu save area is at the end and is
> 64byte aligned ready for xsave (saving the ymm registers).
> Welcome to 6.99.32
https://mail-index.netbsd.org/source-changes/2014/02/20/msg051841.html
It's not obvious to me whether the comment affects anything at hand.
Perhaps it's just admonishing future developers not to add things
_other than fpu state_ because the fpu state is actually variable size
and may extend past the size of a union savefpu object.
--- End Message ---