NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/59411 (deadlock on mbuf pool)
> Date: Mon, 19 May 2025 10:17:57 +0200
> From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
>
> Back to the initial patch in this PR.
>
> I guess running the whole pool_grow() at splvm(), including the busy_wait
> on PR_GROWINGNOWAIT would work (then we could busy-wait even when called from
> interrupt context), but I'm not sure it's better than my initial patch.
Unfortunately, this won't work because mutex_exit will restore spl to
what it was at mutex_enter (unless it was already raised in
mutex_enter by holding another spin lock), even if you try something
like:
+ s = splraiseipl(pp->pr_ipl);
mutex_exit(&pp->pr_lock);
storage = pool_allocator_alloc(pp, flags);
...
mutex_enter(&pp->pr_lock);
+ splx(s);
> My patch will return failure for more cases, but the called should be able
> to deal with that anyway.
I think your initial patch (assuming you mean 1.293, and then just
deleting the whole PR_GROWINGNOWAIT machinery) is likely to be asking
for trouble by continuing to hold the lock across the backing
allocator -- and, perhaps worse, across the backing allocator's free
routine, which sometimes issues a cross-call that requires all other
CPUs to be responsive.
The PR_GROWINGNOWAIT change addressed a real problem _without_ holding
the lock across the backing allocator. Unfortunately, I don't
remember how to provoke the problem, or where there is a record of the
symptoms; perhaps wiz@ does?
Based on the commit message, it sounds like the system would sometimes
fail to boot up, with a crash in init, because of concurrent pool
growth despite having lots of memory available.
This presumably _started_ to happen after I made sure only one thread
at a time could do pool_grow, in order to avoid the situation where
multiple threads doing simultaneous allocation lead to multiple
backing pages with only one object apiece, severely fragmenting kva:
PR kern/45718: processes sometimes get stuck and spin in vm_map
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Now, the PR_GROWINGNOWAIT mechanism is growing into an increasingly
grody hack, certainly, but this means we need to carefully revisit the
design rather than just blithlely create more deadlocks like christos@
did back in 2017.
Home |
Main Index |
Thread Index |
Old Index