NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu



Followup from thread on tech-kern:

https://mail-index.netbsd.org/tech-kern/2025/04/11/msg030349.html
--- Begin Message ---
> Date: Fri, 11 Apr 2025 08:22:59 +0000
> From: Emmanuel Dreyfus <manu%netbsd.org@localhost>
> 
> I tried to boot NetBSD-current (i386 and amd6), BIOS and UEFI modes) on a 
> Dell server, it always panic on a "fatal page fault in supervisor mode"
> early during the boot. The backtrace says we went there through
> main/fork1/lwp_create/memset
> 
> A screenshot is available there:
> https://dl.espci.fr/ticket/2a34048a0a9749c68cfb0dc380cc2259
> 
> Any idea how to debug that? I expected NetBSD to lack some hardware
> support on this recent machine, not to be unable to reach single user 
> mode.

This looks like the same issue as:

PR kern/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
https://gnats.netbsd.org/57661

Some additional diagnostics from the PR:

> The stack trace is a bit of a red herring, I traced down the memset to line 343 of src/sys/arch/x86/x86/fpu.c, so we actually have:
> lwp_create() -> uvm_lwp_fork() -> cpu_lwp_fork() -> fpu_lwp_fork() -> memset()
> 
> Adding
>   printf("DEBUG: sizeof(pcb2->pcb_savefpu)==%ld\n", sizeof(pcb2->pcb_savefpu));
>   printf("DEBUG: x86_fpu_save_size==%d\n", x86_fpu_save_size);
> 
> before the memset() call prints
> 
>   [   1.8432366] DEBUG: sizeof(pcb2->pcb_savefpu)==576
>   [   1.8432366] DEBUG: x86_fpu_save_size==11008
> 
> Changing the VM's CPU to Sandy Bridge prints
> 
>   [  1.8897648] DEBUG: sizeof(pcb2->pcb_savefpu)==576
>   [  1.8897648] DEBUG: x86_fpu_save_size==832

So, the x86_fpu_save_size is larger than we can handle and we don't
report the problem gracefully either -- my initial attempt to put a
more obvious panic in x86/identcpu.c when we initially determine
x86_fpu_save_size hit the problem that it happens too early and sends
the machine into a reboot loop.

Unfortunately, simply expanding the space for pcb_savefpu is not
trivial:

     80 /*
     81  * IMPORTANT NOTE: this structure, including the variable-sized FPU state at
     82  * the end, must fit within one page.
     83  */
     84 struct pcb {
...
    103 	union savefpu	pcb_savefpu __aligned(64); /* floating point state */
    104 	/* **** DO NOT ADD ANYTHING HERE **** */
    105 };

https://nxr.netbsd.org/xref/src/sys/arch/amd64/include/pcb.h?r=1.32#80

The top comment came from this commit:

> Module Name:    src
> Committed By:   maxv
> Date:           Tue Mar 17 17:18:49 UTC 2020
> 
> Modified Files:
>         src/sys/arch/amd64/include: cpu.h param.h pcb.h types.h
>         src/sys/arch/x86/x86: vm_machdep.c
> 
> Log Message:
> Add a redzone between the pcb and the stack. Sent to port-amd64@.

https://mail-index.netbsd.org/source-changes/2020/03/17/msg115178.html

So, the immediate issue is that the memset in fpu_lwp_fork is hitting
the guard page.  Perhaps resolving that is simply a matter of teaching
cpu_uarea_alloc to advance by round_page(offsetof(struct pcb,
pcb_savefpu) + x86_fpu_save_size) rather than by PAGE_SIZE when
deciding where to set the guard page:

    364 void *
    365 cpu_uarea_alloc(bool system)
    366 {
    367 	vaddr_t base, va;
    368 	paddr_t pa;
    369 
    370 	base = uvm_km_alloc(kernel_map, USPACE + PAGE_SIZE, 0,
    371 	    UVM_KMF_WIRED|UVM_KMF_WAITVA);
    372 
    373 	/* Page[1] = RedZone */
    374 	va = base + PAGE_SIZE;
    375 	if (!pmap_extract(pmap_kernel(), va, &pa)) {
    376 		panic("%s: impossible, Page[1] unmapped", __func__);
    377 	}
    378 	pmap_kremove(va, PAGE_SIZE);
    379 	uvm_pagefree(PHYS_TO_VM_PAGE(pa));

https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/vm_machdep.c?r=1.46#352

But the bottom comment, /* **** DO NOT ADD ANYTHING HERE **** */, is
much older:

> Module Name:    src
> Committed By:   dsl
> Date:           Thu Feb 20 18:19:10 UTC 2014
> 
> Modified Files:
>         src/sys/arch/amd64/amd64: genassym.cf machdep.c
>         src/sys/arch/amd64/include: pcb.h proc.h
>         src/sys/arch/i386/i386: locore.S machdep.c
>         src/sys/arch/i386/include: pcb.h proc.h
>         src/sys/arch/x86/x86: vm_machdep.c
>         src/sys/sys: param.h
> 
> Log Message:
> Move the amd64 and i386 pcb to the bottom of the uarea, and move the
>   kernel stack to the top.
> Change the pcb layouts so that fpu save area is at the end and is
>   64byte aligned ready for xsave (saving the ymm registers).
> Welcome to 6.99.32

https://mail-index.netbsd.org/source-changes/2014/02/20/msg051841.html

It's not obvious to me whether the comment affects anything at hand.
Perhaps it's just admonishing future developers not to add things
_other than fpu state_ because the fpu state is actually variable size
and may extend past the size of a union savefpu object.

--- End Message ---


Home | Main Index | Thread Index | Old Index