NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/59411 (deadlock on mbuf pool)



The following reply was made to PR kern/59411; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
        gnats-admin%netbsd.org@localhost, mrg%NetBSD.org@localhost, chs%NetBSD.org@localhost,
        christos%NetBSD.org@localhost, tnn%NetBSD.org@localhost, wiz%NetBSD.org@localhost
Subject: Re: kern/59411 (deadlock on mbuf pool)
Date: Mon, 19 May 2025 15:18:09 +0200

 On Mon, May 19, 2025 at 01:00:29PM +0000, Taylor R Campbell wrote:
 > > Date: Mon, 19 May 2025 10:17:57 +0200
 > > From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
 > > 
 > > Back to the initial patch in this PR.
 > > 
 > > I guess running the whole pool_grow() at splvm(), including the busy_wait
 > > on PR_GROWINGNOWAIT would work (then we could busy-wait even when called from
 > > interrupt context), but I'm not sure it's better than my initial patch.
 > 
 > Unfortunately, this won't work because mutex_exit will restore spl to
 > what it was at mutex_enter (unless it was already raised in
 > mutex_enter by holding another spin lock), even if you try something
 > like:
 
 Yes, it would need something like:
 mutex__exit(&pp->pr_lock)
 s = splraiseipl(pp->pr_ipl);
 mutex_enter(&pp->pr_lock)
 
 at the entry of pool_grow(), and the opposite at the exit.
 
 > [...]
 > I think your initial patch (assuming you mean 1.293, and then just
 > deleting the whole PR_GROWINGNOWAIT machinery)
 
 No, I mean the patch in the first mail in the PR, which skips
 pr_drain_hook in pool_allocator_alloc() in the !PR_WAITOK case.
 
 > is likely to be asking
 > for trouble by continuing to hold the lock across the backing
 > allocator -- and, perhaps worse, across the backing allocator's free
 > routine, which sometimes issues a cross-call that requires all other
 > CPUs to be responsive.
 
 In the actual code, PR_GROWINGNOWAIT isn't doing anything since rev 1.220,
 2017/12/29. NetBSD 8.0_RELEASE did include this change.
 
 rev 1.293 changed a branch that is, AFAIK, never taken, to a KASSERT().
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index