NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54503: Panic during attaching nvme(4) when # of logical CPUs >= 32 ?



>Number:         54503
>Category:       kern
>Synopsis:       Panic during attaching nvme(4) when # of logical CPUs >= 32 ?
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 29 07:55:00 +0000 2019
>Originator:     Rin Okuyama
>Release:        9.99.10
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD kobrpd02 9.99.10 NetBSD 9.99.10 (GENERIC) #1: Thu Aug 29 12:07:14 JST 2019  rin@latipes:/build/work/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Panic occurs during attaching nvme(4) when number of logical CPUs = 48
(24 cores x 2 threads per cores):

>> NetBSD/x86 BIOS Boot, Revision 5.11 (Tue Aug 27 14:53:16 UTC 2019) (from NetBSD 9.99.10)
>> Memory: 629/1702608 k
> boot -v
...
NetBSD 9.99.10 (GENERIC) #1: Thu Aug 29 12:07:14 JST 2019
...
cpu0 at mainbus0 apid 0
timecounter: Timecounter "lapic" frequency 24972680 Hz quality -100
cpu0: Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz, id 0x50654
cpu0: package 0, core 0, smt 0
...
cpu47 at mainbus0 apid 59
cpu47: Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz, id 0x50654
cpu47: package 1, core 13, smt 1
...
ppb3 at pci5 dev 0 function 0: vendor 8086 product 2030 (rev. 0x04)
ppb3: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x16 @ 8.0GT/s
ppb3: link is x4 @ 8.0GT/s
pci6 at ppb3 bus 59
pci6: i/o space, memory space enabled, rd/line, wr/inv ok
nvme0 at pci6 dev 0 function 0: vendor 8086 product 2700 (rev. 0x00)
nvme0: NVMe 1.0
allocated pic msix4 type edge pin 0 level 6 to cpu0 slot 20 idt entry 101
nvme0: for admin queue interrupting at msix4 vec 0
nvme0: INTEL SSDPED1D280GA, firmware E2010325, serial PHMB742101WX280CGN
allocated pic msix4 type edge pin 1 level 6 to cpu0 slot 21 idt entry 102
nvme0: for io queue 1 interrupting at msix4 vec 1 affinity to cpu0
allocated pic msix4 type edge pin 2 level 6 to cpu0 slot 22 idt entry 103
nvme0: for io queue 2 interrupting at msix4 vec 2 affinity to cpu1
...
allocated pic msix4 type edge pin 31 level 6 to cpu0 slot 22 idt entry 136
nvme0: for io queue 31 interrupting at msix4 vec 31 affinity to cpu30
prevented execution of 0x0 (SMEP)
fatal page fault in supervisor mode
trap type 6 code 0x10 rip 0 cs 0x8 rflags 0x10202 cr2 0 ilevel 0x8 rsp 0xffffffff81ae8318
curlwp 0xffffffff8165cc20 pid 0.1 lowest kstack 0xffffffff81ae42c0
kernel: page fault trap, code=0
Stopped in pid 0.1 (system) at  0:uvm_fault(0xffffffff817856e0, 0xffff8d4680000000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8021e5b6 cs 0x8 rflags 0x10a06 cr2 0xffff8d4680000000 ilevel 0x8 rsp 0xffffffff81ae7f00
curlwp 0xffffffff8165cc20 pid 0.1 lowest kstack 0xffffffff81ae42c0
      kernel: page fault trap, code=0
Stopped in pid 0.1 (system) at  netbsd:db_disasm+0xec:  movq    0(%rdx,%rcx,8),%
rcx
db{0}> bt
db_disasm() at netbsd:db_disasm+0xec
db_trap() at netbsd:db_trap+0xf4
kdb_trap() at netbsd:kdb_trap+0xe1
trap() at netbsd:trap+0x327
--- trap (number 6) ---
?() at 0
nvme_poll() at netbsd:nvme_poll+0x104             <--- nvme.c:1261
nvme_attach() at netbsd:nvme_attach+0x927         <--- nvme.c:1509
nvme_pci_attach() at netbsd:nvme_pci_attach+0x309 <--- nvme_pci.c:228
config_attach_loc() at netbsd:config_attach_loc+0x1a5
...

Full dmesg (with serial console) and netbsd.gdb is provided here
(This is GENERIC kernel with MSGBUFSIZE=1048576):

http://www.netbsd.org/~rin/nvme_panic_20190829/dmesg
http://www.netbsd.org/~rin/nvme_panic_20190829/netbsd.gdb.gz (CAUTION HUGE!!)

When hyper threading is disabled in BIOS, i.e., # of logical CPUs =
# of cores = 24, the system boots fine. Here's dmesg -t, intrctl list,
pcictl pci6 dump -d 0, and acpidump -d:

http://www.netbsd.org/~rin/nvme_panic_20190829/HT_disabled/dmesg
http://www.netbsd.org/~rin/nvme_panic_20190829/HT_disabled/intrctl
http://www.netbsd.org/~rin/nvme_panic_20190829/HT_disabled/pcictl
http://www.netbsd.org/~rin/nvme_panic_20190829/HT_disabled/acpidump

The system also boots fine if nvme* is disabled in userconf(4), even if
hyper threading is enabled in BIOS. Here's dmesg -t, intrctl list,
pcictl pci6 dump -d 0, and acpidump -d:

http://www.netbsd.org/~rin/nvme_panic_20190829/nvme_disabled/dmesg
http://www.netbsd.org/~rin/nvme_panic_20190829/nvme_disabled/intrctl
http://www.netbsd.org/~rin/nvme_panic_20190829/nvme_disabled/pcictl
http://www.netbsd.org/~rin/nvme_panic_20190829/nvme_disabled/acpidump

Can I provide any other information? Since this machine is located in
my previous office, I cannot access its console or BIOS immediately.
Now, it runs with hyper threading disabled.
>How-To-Repeat:
Boot that machine with nvme* and hyper threading enabled.
>Fix:
N/A



Home | Main Index | Thread Index | Old Index