tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Kernel locks when configuring motherboard ethernet



Hi, all,

On several amd64 systems which boot via UEFI, the first attempt to make changes to the built-in Realtek ethernet port locks the kernel.

I observed this issue a while ago, but only on old (2014) AMD AM1 motherboards, so I thought it was a quirk of older UEFI. I noticed this issue didn't happen if I configured BIOS settings to enable the network stack, even though I don't turn on PXE for either IPv4 or IPv6. I figured NetBSD isn't initializing the re* interface properly when booting via UEFI, but the BIOS does, so problem circumvented.

However, I've recently set up some newer systems that've been showing the same issue. They are Ryzen AM4 and AM5 systems with BIOSes that've been updated within the last month or two.

The observed behavior is that if the network is very busy when the system boots, a majority of the time the kernel will lock. If the network is idle, the kernel won't lock up.

In the past, I've never been able to get in to the kernel debugger, since the lockup prevents all keyboard activity, but I have a colocated system with a serial console where I can drop in to the debugger.

Here's what I got after a lockup after trying to configure re0:

[  30.8543400] fatal breakpoint trap in supervisor mode
[  30.8543400] trap type 1 code 0 rip 0xffffffff80235385 cs 0x8 rflags 0x202 cr2 0x7f7ede210ff8 ilevel 0x8 rsp 0xffffc48839ae4be8
[  30.8543400] curlwp 0xffff8fdc02285100 pid 0.11 lowest kstack 0xffffc48839ae02c0
Stopped in pid 0.11 (system) at netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x86d
intr_kdtrace_wrapper() at netbsd:intr_kdtrace_wrapper+0x26
Xhandle_ioapic_edge1() at netbsd:Xhandle_ioapic_edge1+0x75
--- interrupt ---
bus_space_read_2() at netbsd:bus_space_read_2+0xb
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x37
Xhandle_ioapic_edge18() at netbsd:Xhandle_ioapic_edge18+0x75
--- interrupt ---
_kernel_lock() at netbsd:_kernel_lock+0xca
if_link_state_change_work() at netbsd:if_link_state_change_work+0x1e
workqueue_worker() at netbsd:workqueue_worker+0xd9
ds          5c20
es          4c18
fs          524
gs          5230
rdi         ffffffff81845120    x86_io
rsi         3f8
rbp         ffffc48839ae4be8
rbx         ffffc4803df8c006
rdx         1
rcx         100
rax         7f
r8          0
r9          0
r10         ffffc4884f9f8eec
r11         ffffc4884f9f8ee8
r12         ffff8fd5096a1790
r13         7fd
r14         c6
r15         ffff8fd5096a16c0
rip         ffffffff80235385    breakpoint+0x5
cs          8
rflags      202
rsp         ffffc48839ae4be8
ss          10
netbsd:breakpoint+0x5:  leave


Also, I tried a kernel with LOCKDEBUG and it paniced before finishing boot:

Configuring network interfaces: re0[  22.9855318] cpu0[564 sh]: hogging kernel lock
[  22.9855318] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x56
[  22.9855318] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x70
[  22.9855318] x86_ipi_handler() at netbsd:x86_ipi_handler+0x6f
[  22.9855318] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18
[  22.9855318] --- interrupt ---
[  22.9855318] Xspllower() at netbsd:Xspllower+0xe
[  22.9855318] Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x1e
[  22.9855318] --- interrupt ---
[  22.9855318] bus_space_read_2() at netbsd:bus_space_read_2+0xb
[  22.9855318] intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x37
[  22.9855318] Kernel lock error: _kernel_lock,266: spinout

[  23.7255334] lock address : netbsd:kernel_lock
[  23.7755332] type         : spin
[  23.8155330] initialized  : netbsd:main+0x31
[  23.8655316] shared holds :                  0 exclusive:                  1
[  23.9555319] shares wanted:                  0 exclusive:                  1
[  24.0355338] relevant cpu :                  1 last held:                  0
[  24.1155337] relevant lwp : 0xfffff8a0cd91e680 last held: 0xfffff8a0d003fac0
[  24.2055344] last locked* : netbsd:intr_biglock_wrapper+0x15
[  24.2655342] unlocked     : netbsd:softint_dispatch+0x186
[  24.3355336] curcpu holds :                  0 wanted by: 0xfffff8a0cd91e680

[  24.4155346] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,266: spinout
[  24.4955330] cpu1: Begin traceback...
[  24.5455334] vpanic() at netbsd:vpanic+0x183
[  24.5955350] panic() at netbsd:panic+0x3c
[  24.6455370] lockdebug_abort1() at netbsd:lockdebug_abort1+0xe6
[  24.7055350] _kernel_lock() at netbsd:_kernel_lock+0x2a7
[  24.7755337] softint_dispatch() at netbsd:softint_dispatch+0x16d
[  24.8455340] DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffff9a0839ba30f0
[  24.9255357] Xsoftintr() at netbsd:Xsoftintr+0x4c
[  24.9855339] --- interrupt ---
[  25.0155356] 0:
[  25.0355354] cpu1: End traceback...

[  25.0855358] dumping to dev 168,2 (offset=68267703, size=8243720):
[ 25.1555346] dump 1262 1261 1260 1259 1258 1257 1256 1255 1254 1253 1252 1251 1250 1249 1248 1247 1246 1245 1244 1243 1242 1241 ...


Since using the LOCKDEBUG kernel, this system can't use the network at all without locking up, even after a hardware reset. It's colocated, so while I can have someone physically power cycle the machine, I figured I'd leave it in case more information can be gained from it as it is.

The serial console can be accessed via another system via cu, and the other system can also do a hardware reset. The system obviously can't talk on the Internet, but it has netbsd-10 sources and can compile a kernel for itself.

The previous kernel that has been running for a couple of weeks had locked up twice, and I don't know if that's directly related to this, because it had nothing to do with configuring network ports. Interestingly, I've seen the same lockups with the previous machine that this machine replaced (8 gig Raspberry Pi 4, netbsd-10). These machines are public facing and are routing parts of a class C over tinc tunnels.

Here's one lockup:

[ 495715.4076245] fatal breakpoint trap in supervisor mode
[ 495715.4076245] trap type 1 code 0 rip 0xffffffff80235385 cs 0x8 rflags 0x202 cr2 0x76f4a20740
00 ilevel 0x8 rsp 0xffffa80839aac8c8
[ 495715.4076245] curlwp 0xffffa0ed91107480 pid 0.3 lowest kstack 0xffffa80839aa82c0
Stopped in pid 0.3 (system) at  netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x7e0
intr_kdtrace_wrapper() at netbsd:intr_kdtrace_wrapper+0x26
Xhandle_ioapic_edge1() at netbsd:Xhandle_ioapic_edge1+0x75
--- interrupt ---
npf_tcpsaw() at netbsd:npf_tcpsaw+0x1d
npf_conn_inspect() at netbsd:npf_conn_inspect+0x86
npfk_packet_handler() at netbsd:npfk_packet_handler+0x18e
pfil_run_hooks() at netbsd:pfil_run_hooks+0x128
ip_output() at netbsd:ip_output+0x4c0
ip_forward() at netbsd:ip_forward+0x138
ipintr() at netbsd:ipintr+0xa80
softint_dispatch() at netbsd:softint_dispatch+0x95
DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffffa80839aad0f0
Xsoftintr() at netbsd:Xsoftintr+0x4c
--- interrupt ---
b31c059c10208e97:
ds          c9a0
es          ddb3
fs          1
gs          e8d9
rdi         ffffffff81845120    x86_io
rsi         800
rbp         ffffa80839aac8c8
rbx         ffffa8003df8c01c
rdx         7f
rcx         22
rax         1
r8          ffffa80839aaca94
r9          0
r10         5ed7b6ca02a0
r11         ffffa8003df91008
r12         ffffa0e6944a1790
r13         800
r14         cc
r15         ffffa0e6944a16c0
rip         ffffffff80235385    breakpoint+0x5
cs          8
rflags      202
rsp         ffffa80839aac8c8
ss          0
netbsd:breakpoint+0x5:  leave


Does anyone have any suggestions about what to try next? Does anyone want to have a look around themselves?

Thanks,
John Klos


Home | Main Index | Thread Index | Old Index