NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/59411 (deadlock on mbuf pool)



The following reply was made to PR kern/59411; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        riastradh%NetBSD.org@localhost, mrg%NetBSD.org@localhost, chs%NetBSD.org@localhost,
        christos%NetBSD.org@localhost, tnn%NetBSD.org@localhost
Subject: Re: kern/59411 (deadlock on mbuf pool)
Date: Sat, 17 May 2025 00:06:09 +0200

 On Fri, May 16, 2025 at 11:50:07AM +0200, Manuel Bouyer wrote:
 > [...]
 > 
 > I will reboot the aftected server this evening; need to wait at last
 > monday evening to be sure the issue isn't back.
 
 I quickly found out why we can't mutex_exit()/mutex_enter() in
 pool_grow() in the !PR_WAITOK case. The server hung after a few
 minutes with the new kernel; this time the deadlock involves a single CPU:
 login: [ 572.5831378] fatal breakpoint trap in supervisor mode
 [ 572.5831378] trap type 1 code 0 rip 0xffffffff80232835 cs 0x8 rflags 0x202 cr8
 [ 572.5831378] curlwp 0xffffcbeba015d480 pid 0.3 lowest kstack 0xffffa91b20d7f20
 Stopped in pid 0.3 (system) at  netbsd:breakpoint+0x5:  leave
 breakpoint() at netbsd:breakpoint+0x5
 comintr() at netbsd:comintr+0x7e0
 intr_wrapper() at netbsd:intr_wrapper+0x4b
 Xhandle_ioapic_edge2() at netbsd:Xhandle_ioapic_edge2+0x6f 
 --- interrupt ---
 mutex_enter() at netbsd:mutex_enter+0x11
 pool_get() at netbsd:pool_get+0x3c7
 pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x139  
 pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x233
 m_get() at netbsd:m_get+0x37  
 m_gethdr() at netbsd:m_gethdr+0x9
 bge_fill_rx_ring_std() at netbsd:bge_fill_rx_ring_std+0x13c
 bge_intr() at netbsd:bge_intr+0x9ed
 intr_wrapper() at netbsd:intr_wrapper+0x4b
 intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1f
 Xhandle_ioapic_edge18() at netbsd:Xhandle_ioapic_edge18+0x6f
 --- interrupt ---
 --- interrupt ---
 Xspllower() at netbsd:Xspllower+0xe
 uvm_km_kmem_alloc() at netbsd:uvm_km_kmem_alloc+0x53
 pool_page_alloc() at netbsd:pool_page_alloc+0x2c
 pool_grow() at netbsd:pool_grow+0x253
 pool_get() at netbsd:pool_get+0x3c7
 pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x139
 pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x233
 m_get() at netbsd:m_get+0x37
 m_copy_internal() at netbsd:m_copy_internal+0x13e
 tcp4_segment() at netbsd:tcp4_segment+0x1f9
 ip_tso_output() at netbsd:ip_tso_output+0x24
 ip_output() at netbsd:ip_output+0x18c4
 tcp_output() at netbsd:tcp_output+0x165e
 tcp_input() at netbsd:tcp_input+0xfd5
 ipintr() at netbsd:ipintr+0x8f1       
 softint_dispatch() at netbsd:softint_dispatch+0x11c
 DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffffa91b20d840f0    
 Xsoftintr() at netbsd:Xsoftintr+0x4c  
 --- interrupt ---
 
 The soft interrupt set PR_GROWINGNOWAIT before calling pool_allocator_alloc().
 Then an interrupt comes in, which does a pool_get() again, and because
 PR_GROWINGNOWAIT is set it will busy-wait for it to clear.
 The soft interrupt has no chance to ever clear it.
 
 Back to the previous kernel, where we don't release the lock in the
 !WAITOK case and doesn't call pr_drain_hook from pool_allocator_alloc()
 either.
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index