NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: [10.0_STABLE] Hard lock
On Sun, Nov 24, 2024 at 09:30:08AM +0100, BERTRAND Joël wrote:
That is still some kind of a double-free that shouldn't happen.
> Nov 24 04:31:42 legendre /netbsd: [ 195210.6117279] S3C1: freeing UNUSED pdu
> Nov 24 04:31:42 legendre /netbsd: [ 195605.2522802] S3C1: freeing UNUSED pdu
> Nov 24 04:31:42 legendre /netbsd: [ 196621.8736806] S3C1: freeing UNUSED pdu
> Nov 24 04:31:42 legendre /netbsd: [ 197170.5744073] S3C1: freeing UNUSED pdu
> Nov 24 04:31:42 legendre /netbsd: [ 197263.6445312] S3C1: freeing UNUSED pdu
> Nov 24 04:31:42 legendre /netbsd: [ 197369.3246735] S3C1: freeing UNUSED pdu
> Nov 24 04:31:42 legendre /netbsd: [ 198077.3456182] S3C1: freeing UNUSED pdu
> Nov 24 04:31:42 legendre /netbsd: [ 200062.4366678] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff928021d2d408
CPU0 (executing the softnet thread) has kernel lock taken and waits for the mbuf pool lock.
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] cpu0[3 softnet/0]: hogging kernel lock
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] --- interrupt ---
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] mutex_enter() at netbsd:mutex_enter+0x50f
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] pool_get() at netbsd:pool_get+0x78
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] m_get() at netbsd:m_get+0x37
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] m_gethdr() at netbsd:m_gethdr+0x9
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] tcp_output() at netbsd:tcp_output+0x135c
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] tcp_input() at netbsd:tcp_input+0x29dc
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] ipintr() at netbsd:ipintr+0x993
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] softint_dispatch() at netbsd:softint_dispatch+0x11c
This shows the system crashing for a kernel lock spinout, according to the "hogging kernel lock" message
the kernel lock is held by the code above for more than 10 seconds.
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffff9284371ae0f0
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] Xsoftintr() at netbsd:Xsoftintr+0x4c
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] --- interrupt ---
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8837560] 4406abff750ac8e5:
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] Kernel lock error: _kernel_lock,266: spinout
> Nov 24 04:31:42 legendre /netbsd:
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] lock address : netbsd:kernel_lock
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] type : spin
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] initialized : netbsd:main+0x72
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] shared holds : 0 exclusive: 1
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] shares wanted: 0 exclusive: 5
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] relevant cpu : 1 last held: 0
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] relevant lwp : 0xffff8041b670e640 last held: 0xffff8044b4e1d480
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] last locked* : netbsd:ipintr+0x44
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] unlocked : netbsd:tcp_rcvd_wrapper+0x51
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] curcpu holds : 0 wanted by: 0xffff8041b670e640
> Nov 24 04:31:42 legendre /netbsd:
CPU1 (executing the wm driver) needs to allocate a mbuf and needs to grow the mbuf pool. Growing failed and it
tries to drain the pool which then waits for the kernel lock. When waiting longer than 10 seconds ("spinout"),
it panics the system.
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,266: spinout
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] cpu1: Begin traceback...
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] vpanic() at netbsd:vpanic+0x183
> Nov 24 04:31:42 legendre /netbsd: [ 207938.8998476] panic() at netbsd:panic+0x3c
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] lockdebug_abort1() at netbsd:lockdebug_abort1+0xe6
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] _kernel_lock() at netbsd:_kernel_lock+0x2a7
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] mb_drain() at netbsd:mb_drain+0x17
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] pool_grow() at netbsd:pool_grow+0x3b9
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] pool_get() at netbsd:pool_get+0x3e5
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] m_get() at netbsd:m_get+0x37
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9037543] m_gethdr() at netbsd:m_gethdr+0x9
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] wm_add_rxbuf() at netbsd:wm_add_rxbuf+0x3a
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] wm_rxeof() at netbsd:wm_rxeof+0x114
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] wm_handle_queue() at netbsd:wm_handle_queue+0xff
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] softint_dispatch() at netbsd:softint_dispatch+0x11c
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffff9284376cd0f0
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] Xsoftintr() at netbsd:Xsoftintr+0x4c
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] --- interrupt ---
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] 0:
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] cpu1: End traceback...
> Nov 24 04:31:42 legendre /netbsd:
> Nov 24 04:31:42 legendre /netbsd: [ 207938.9137542] dumping to dev 18,1 (offset=253015, size=4162677):
My guess is that when trying to allocate an mbuf the pool_get() function gets into a long lasting (possibly infinite)
loop, even when called with PR_NOWAIT.
Greetings,
--
Michael van Elst
Internet: mlelstv%serpens.de@localhost
"A potential Snark may lurk in every tree."
Home |
Main Index |
Thread Index |
Old Index