NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/59411 (deadlock on mbuf pool)
The following reply was made to PR kern/59411; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
riastradh%NetBSD.org@localhost, mrg%NetBSD.org@localhost, chs%NetBSD.org@localhost,
christos%NetBSD.org@localhost, tnn%NetBSD.org@localhost
Subject: Re: kern/59411 (deadlock on mbuf pool)
Date: Sat, 17 May 2025 00:06:09 +0200
On Fri, May 16, 2025 at 11:50:07AM +0200, Manuel Bouyer wrote:
> [...]
>
> I will reboot the aftected server this evening; need to wait at last
> monday evening to be sure the issue isn't back.
I quickly found out why we can't mutex_exit()/mutex_enter() in
pool_grow() in the !PR_WAITOK case. The server hung after a few
minutes with the new kernel; this time the deadlock involves a single CPU:
login: [ 572.5831378] fatal breakpoint trap in supervisor mode
[ 572.5831378] trap type 1 code 0 rip 0xffffffff80232835 cs 0x8 rflags 0x202 cr8
[ 572.5831378] curlwp 0xffffcbeba015d480 pid 0.3 lowest kstack 0xffffa91b20d7f20
Stopped in pid 0.3 (system) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x7e0
intr_wrapper() at netbsd:intr_wrapper+0x4b
Xhandle_ioapic_edge2() at netbsd:Xhandle_ioapic_edge2+0x6f
--- interrupt ---
mutex_enter() at netbsd:mutex_enter+0x11
pool_get() at netbsd:pool_get+0x3c7
pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x139
pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x233
m_get() at netbsd:m_get+0x37
m_gethdr() at netbsd:m_gethdr+0x9
bge_fill_rx_ring_std() at netbsd:bge_fill_rx_ring_std+0x13c
bge_intr() at netbsd:bge_intr+0x9ed
intr_wrapper() at netbsd:intr_wrapper+0x4b
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1f
Xhandle_ioapic_edge18() at netbsd:Xhandle_ioapic_edge18+0x6f
--- interrupt ---
--- interrupt ---
Xspllower() at netbsd:Xspllower+0xe
uvm_km_kmem_alloc() at netbsd:uvm_km_kmem_alloc+0x53
pool_page_alloc() at netbsd:pool_page_alloc+0x2c
pool_grow() at netbsd:pool_grow+0x253
pool_get() at netbsd:pool_get+0x3c7
pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x139
pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x233
m_get() at netbsd:m_get+0x37
m_copy_internal() at netbsd:m_copy_internal+0x13e
tcp4_segment() at netbsd:tcp4_segment+0x1f9
ip_tso_output() at netbsd:ip_tso_output+0x24
ip_output() at netbsd:ip_output+0x18c4
tcp_output() at netbsd:tcp_output+0x165e
tcp_input() at netbsd:tcp_input+0xfd5
ipintr() at netbsd:ipintr+0x8f1
softint_dispatch() at netbsd:softint_dispatch+0x11c
DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffffa91b20d840f0
Xsoftintr() at netbsd:Xsoftintr+0x4c
--- interrupt ---
The soft interrupt set PR_GROWINGNOWAIT before calling pool_allocator_alloc().
Then an interrupt comes in, which does a pool_get() again, and because
PR_GROWINGNOWAIT is set it will busy-wait for it to clear.
The soft interrupt has no chance to ever clear it.
Back to the previous kernel, where we don't release the lock in the
!WAITOK case and doesn't call pr_drain_hook from pool_allocator_alloc()
either.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index