NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/58531: NetBSD 10.0 deadlock in nd_timer



>Number:         58531
>Category:       kern
>Synopsis:       NetBSD 10.0 deadlock in nd_timer
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 31 15:05:00 +0000 2024
>Originator:     Emmanuel Dreyfus
>Release:        NetBSD 10.0
>Organization:
NetBSD
>Environment:
System: NetBSD 10.0
Architecture: i386
Machine: i386
>Description:
	Without LOCKDEBUG, the kernel freeze a few time a day
	With LOCKDEBUG, I get this panic:

[ 73171.2786945] cpu1[20 softclk/1]: hogging kernel lock
[ 73171.2786945] ipi_msg_cpu_handler(0,0,c84513c0,c09dc4f3,0,c846b000,1,1,dfc26000,dfe73f70) at netbsd:ipi_msg_cpu_handler+0x4c
[ 73171.2886997] ipi_cpu_handler(dfc26000,0,6,80,c84517c0,dfe6ac28,c0102c92,dfe6ab88,0,0) at netbsd:ipi_cpu_handler+0x90
[ 73171.2987044] x86_ipi_handler(dfe6ab88,0,0,0,0,0,0,0,0,0) at netbsd:x86_ipi_handler+0x6b
address 0x34 is invalid
address 0x30 is invalid
[ 73171.3187143] DDB lost frame for netbsd:Xresume_lapic_ipi+0x22, trying 0xdfe73f78
[ 73171.3187143] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x22
[ 73171.3287193] --- interrupt ---
[ 73171.3287193] 0:
[ 73171.3287193] Kernel lock error: _kernel_lock,266: spinout

[ 73171.3287193] lock address : netbsd:kernel_lock
[ 73171.3287193] type : spin
[ 73171.3287193] initialized : netbsd:main+0x5c
[ 73171.3287193] shared holds : 0 exclusive: 1
[ 73171.3287193] shares wanted: 0 exclusive: 1
[ 73171.3287193] relevant cpu : 0 last held: 1
[ 73171.3287193] relevant lwp : 0x00000000c856f300 last held: 0x00000000c859a940
[ 73171.3287193] last locked* : netbsd:nd_timer+0x30
[ 73171.3287193] unlocked : netbsd:ipintr+0x1f0
[ 73171.3287193] curcpu holds : 0 wanted by: 0x00000000c856f300

[ 73171.4010615] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,266: spinout
[ 73171.4010615] cpu0: Begin traceback...
[ 73171.4010615] vpanic(c0f368cc,dfd7fc90,dfd7fcbc,c09f6e4d,c0f368cc,c0f2d347,c0e4fe00,10a,c0f308d6,1) at netbsd:vpanic+0x196
[ 73171.4010615] panic(c0f368cc,c0f2d347,c0e4fe00,10a,c0f308d6,1,c0f308d6,10a,c0e4fe00,6f9086) at netbsd:panic+0x18
[ 73171.4010615] lockdebug_abort1(6,c0f308d6,1,c42f513c,c0f308d6,4,0,0,c42e10c0,1) at netbsd:lockdebug_abort1+0xdb
[ 73171.4010615] _kernel_lock(1,c8aafbc4,c8c7d290,dfd7fd54,c0ba15ca,0,0,c92eca00,dfd7fd60,c0993ff7) at netbsd:_kernel_lock+0x2e3
[ 73171.4010615] filter_event(c9345240,c8b3c4c0,c8b3c510,3c,c8b3c4c0,dfd7fdb8,c0a1f69d,c8b3c518,0,2da) at netbsd:filter_event+0x101
[ 73171.4010615] knote(c8b3c518,0,2da,c0e4ff08,c0ab3cf4,6,0,c42e1210,c9345240,c8b3c4c0) at netbsd:knote+0x2e
[ 73171.4010615] selnotify(c8b3c510,0,0,c84518c0,c93df1c0,6,c0b94a90,1,2a,c8d03956) at netbsd:selnotify+0x2b
[ 73171.4010615] bpf_deliver(2a,2a,2,3d5,5,c0a0019b,1000,c92e22c0,1feb0,c8837834) at netbsd:bpf_deliver+0x30c
[ 73171.4010615] wm_send_common_locked(c84517c0,c8451080,dfe005c4,c856f300,c8451080,9ac900,0,c8839010,64,c8837834) at netbsd:wm_send_common_locked+0x41e
[ 73171.4010615] wm_handle_queue(c8839000,1,0,0,0,0,0,0,0,0) at netbsd:wm_handle_queue+0x260
[ 73171.4010615] softint_dispatch(c856f040,4,70796c47,68,63637553,737365,0,0,0,0) at netbsd:softint_dispatch+0xe6
[ 73171.4010615] Bad frame pointer: 0xc8615000
[ 73171.4010615] cpu0: End traceback...
[ 73171.4010615] fatal breakpoint trap in supervisor mode
[ 73171.4010615] trap type 1 code 0 eip 0xc0127eb4 cs 0x8 eflags 0x202 cr2 0xb7127000 ilevel 0x8 esp 0xdfd7fc74
[ 73171.4010615] curlwp 0xc856f300 pid 0 lid 3 lowest kstack 0xdfd7d2c0
Stopped in pid 0.3 (system) at netbsd:breakpoint+0x4: popl %ebp

db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
848 848 3 1 180 c9414940 openssl select
334 334 3 0 180 c95d7cc0 sleep nanoslp
16719 16719 3 1 180 c94c6400 xargs wait
4911 4911 3 1 180 c94f3700 sh wait
14642 14642 3 1 180 c93ff900 ksh pause
12211 12211 3 0 180 c95d7a00 tclsh wait
23339 23339 3 0 180 c94c6c40 cron pipe_rd
20690 20690 3 1 180 c93b0080 cron pipe_rd
854 854 3 1 180 c899ab80 getty ttyraw
1115 1115 3 0 180 c92e4d40 cron nanoslp
958 1116 3 1 180 c92e47c0 named parked
958 1113 3 0 180 c8b40a00 named parked
958 1117 3 1 180 c93b0b80 timer parked
958 1105 3 0 180 c93b08c0 isc-net-0003 kqueue
958 1096 3 1 180 c93b0600 isc-net-0002 kqueue
958 812 3 0 180 c93b0340 isc-net-0001 kqueue
958 1013 3 1 180 c9350040 isc-net-0000 kqueue
958 958 3 1 180 c9350300 named sigwait
681 681 3 1 180 c8aed9c0 openvpn netio
680 680 3 1 180 c8b40740 openvpn poll
955 955 3 0 180 c8b401c0 parpd kqueue
828 828 3 1 180 c92e4240 sshd poll
511 511 3 1 180 c8b40480 syslogd kqueue
1 1 3 1 180 c8a7b6c0 init wait
0 181 3 0 200 c899a8c0 physiod physiod
0 204 3 1 200 c8a7bc40 pooldrain pooldrain
0 203 3 1 200 c899a340 ioflush syncer
0 202 3 0 200 c899a600 pgdaemon pgdaemon
0 199 3 0 200 c899a080 swwreboot swwreboot
0 197 3 0 200 c897ab40 usb1 usbevt
0 196 3 1 200 c899e0c0 usb0 usbevt
0 195 3 1 200 c8a7b980 npfgc0 npfgcw
0 194 3 1 200 c8a7b400 rt_free rt_free
0 193 3 1 200 c8a7b140 unpgc unpgc
0 192 3 1 200 c89afc00 key_timehandler key_timehandler
0 184 3 1 200 c89af940 carp_wqinput/1 carp_wqinput
0 183 3 0 200 c89af680 carp_wqinput/0 carp_wqinput
0 182 3 1 200 c89af3c0 icmp_wqinput/1 icmp_wqinput
0 31 3 0 200 c89af100 icmp_wqinput/0 icmp_wqinput
0 63 3 1 200 c899ebc0 rt_timer rt_timer
0 126 3 1 200 c899e900 vmem_rehash vmem_rehash
0 125 3 1 200 c899e640 coretemp1 coretemp1
0 124 3 1 200 c899e380 coretemp0 coretemp0
0 115 3 0 200 c897a880 entbutler entropy
0 114 3 1 240 c897a5c0 atabus0 atath
0 113 3 0 200 c897a300 wm7Reset wm7Reset
0 112 3 1 200 c897a040 wm7TxRx/1 wm7TxRx
0 111 3 0 200 c88f9d40 wm7TxRx/0 wm7TxRx
0 110 3 0 200 c88f9a80 wm6Reset wm6Reset
0 109 3 1 200 c88f97c0 wm6TxRx/1 wm6TxRx
0 108 3 0 200 c88f9500 wm6TxRx/0 wm6TxRx
0 107 3 0 200 c88f9240 wm5Reset wm5Reset
0 106 3 1 200 c88d3d00 wm5TxRx/1 wm5TxRx
0 105 3 0 200 c88d3a40 wm5TxRx/0 wm5TxRx
0 104 3 0 200 c88d3780 wm4Reset wm4Reset
0 1038d3200 wm4TxRx/0 wm4TxRx
0 101 3 0 200 c8894cc0 wm3Reset wm3Reset
0 100 3 1 200 c8894a00 wm3TxRx/1 wm3TxRx
0 99 3 0 200 c8894740 wm3TxRx/0 wm3TxRx
0 98 3 0 200 c8894480 wm2Reset wm2Reset
0 97 3 1 200 c88941c0 wm2TxRx/1 wm2TxRx
0 96 3 0 200 c8812c80 wm2TxR0 c8812440 wm1TxRx/0 wm1TxRx
0 27 3 0 200 c8812180 wm0Reset wm0Reset
0 26 3 1 200 c859dc40 wm0TxRx/1 wm0TxRx
0 25 3 0 200 c859d980 wm0TxRx/0 wm0TxRx
0 24 3 0 200 c859d6c0 usbtask-dr usbtsk
0 23 3 0 200 c859d400 usbtask-hc usbtsk
0 22 3 1 200 c859d140 xcal0 softbio/1
0 21 1 1 200 c859ac00 softser/1
0 > 20 7 1 200 c859a940 softclk/1
0 19 1 1 200 c859a680 softbio/1
0 18 1 1 200 c859a3c0 softnet/1
0 > 17 1 1 201 c859a100 idle/1
0 16 3 0 200 c857fbc0 sysmon smtaskq
0 15 3 0 200 c857f900 pmfsuspend pmfsuspend
0 14 3 0 200 c857f640 pmfevent pmfevent
0 13 3 0 200 c857f380 sopendfree sopendfr
0 12 3 0 200 c857c8571600 vdrain vdrain
0 8 3 0 200 c8571340 modunload mod_unld
0 7 3 0 200 c8571080 xcall/0 xcall
0 6 1 0 200 c856fb40 softser/0
0 5 1 0 200 c856f880 softclk/0
0 4 1 0 200 c856f5c0 softbio/0
0 > 3 7 0 200 c856f300 softnet/0
0 > 2 1 0 201 c856f040 idle/0
0 0 3 1 200 c42f5180 swapper uvm

>How-To-Repeat:
	The machine is a relatively idle router, it happens a few time
	a day.
>Fix:
	boot -1 seems an obvious and effective workaround. I am certain
	we can do better.



Home | Main Index | Thread Index | Old Index