NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/59411 (deadlock on mbuf pool)
The following reply was made to PR kern/59411; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
gnats-admin%netbsd.org@localhost, mrg%NetBSD.org@localhost, chs%NetBSD.org@localhost,
christos%NetBSD.org@localhost, tnn%NetBSD.org@localhost
Subject: Re: kern/59411 (deadlock on mbuf pool)
Date: Mon, 19 May 2025 10:17:57 +0200
On Sat, May 17, 2025 at 12:51:26AM +0000, Taylor R Campbell wrote:
> Can you try the attached patch?
>
> And, can you record some dtrace output to see where this busy-wait
> logic takes effect?
>
> dtrace -n '
> sdt:::pool_grow* { @[probename, stack()] = count() }
> tick-10s { printa(@) }
> '
> # HG changeset patch
> # User Taylor R Campbell <riastradh%NetBSD.org@localhost>
> # Date 1747442650 0
> # Sat May 17 00:44:10 2025 +0000
> # Branch trunk
> # Node ID f8a9f7969e74c93df131b0d1c86775697fa66e62
> # Parent e65a3a130b666dadcc0a084f312f8bb66063fa65
> # EXP-Topic riastradh-pr59411-mbufpooldeadlock
> pool(9): Don't busy-wait in pool_grow with PR_NOWAIT from (soft)intr.
>
> PR kern/59411: deadlock on mbuf pool
With this patch I got again a deadlock against kernel_lock; here's
the relevant stack traces:
[ 148230.3394940] fatal breakpoint trap in supervisor mode
[ 148230.3394940] trap type 1 code 0 rip 0xffffffff80232835 cs 0x8 rflags 0x2028
[ 148230.3394940] curlwp 0xfffff273b01fa4c0 pid 2652.2652 lowest kstack 0xffffb0
Stopped in pid 2652.2652 (nginx) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x7e0
intr_wrapper() at netbsd:intr_wrapper+0x4b
Xhandle_ioapic_edge2() at netbsd:Xhandle_ioapic_edge2+0x6f
--- interrupt ---
_kernel_lock() at netbsd:_kernel_lock+0xdc
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x17
Xhandle_ioapic_edge18() at netbsd:Xhandle_ioapic_edge18+0x6f
--- interrupt ---
Xspllower() at netbsd:Xspllower+0xe
pool_get() at netbsd:pool_get+0x3c7
pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x139
pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x233
m_clget() at netbsd:m_clget+0x2b
sosend() at netbsd:sosend+0x489
soo_write() at netbsd:soo_write+0x2f
dofilewrite() at netbsd:dofilewrite+0x80
sys_write() at netbsd:sys_write+0x49
syscall() at netbsd:syscall+0x196
--- syscall (number 4) ---
db{0}> mach cpu 2
using CPU 2
db{0}> tr
mutex_vector_enter() at netbsd:mutex_vector_enter+0x198
pool_grow() at netbsd:pool_grow+0x4e2
pool_get() at netbsd:pool_get+0x3c7
pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x139
pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x233
m_clget() at netbsd:m_clget+0x2b
m_copyback_internal() at netbsd:m_copyback_internal+0x694
ipf_check_wrapper() at netbsd:ipf_check_wrapper+0x2a
pfil_run_hooks() at netbsd:pfil_run_hooks+0x154
ip_output() at netbsd:ip_output+0x488
tcp_output() at netbsd:tcp_output+0x165e
tcp_send_wrapper() at netbsd:tcp_send_wrapper+0x63
sosend() at netbsd:sosend+0x961
soo_write() at netbsd:soo_write+0x2f
dofilewrite() at netbsd:dofilewrite+0x80
sys_write() at netbsd:sys_write+0x49
syscall() at netbsd:syscall+0x196
--- syscall (number 4) ---
On CPU 0, pool_get() calls pool_catchup() which calls pool_grow()
with PR_NOWAIT. pool_grow() sets PR_GROWING | PR_GROWINGNOWAIT
CPU 2 is running with kernel_lock when calling m_clget(). It sees
PR_GROWING | PR_GROWINGNOWAIT and spins.
CPU 0 releases pr_lock, gets an interrupt and waits on kernel_lock.
Back to the initial patch in this PR.
I guess running the whole pool_grow() at splvm(), including the busy_wait
on PR_GROWINGNOWAIT would work (then we could busy-wait even when called from
interrupt context), but I'm not sure it's better than my initial patch.
My patch will return failure for more cases, but the called should be able
to deal with that anyway.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index