NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/59411 (deadlock on mbuf pool)



Synopsis: deadlock on mbuf pool

State-Changed-From-To: open->feedback
State-Changed-By: riastradh%NetBSD.org@localhost
State-Changed-When: Fri, 16 May 2025 02:29:22 +0000
State-Changed-Why:
[+cc mrg chs christos tnn, who were all involved in the background
leading up to this state of affairs]


Until subr_pool.c rev. 1.220 in 2017, pool_grow dropped the lock around
pool_allocator_alloc and thus avoided this deadlock, but it's unclear
why that change was made:


Module Name:    src
Committed By:   christos
Date:           Fri Dec 29 16:13:26 UTC 2017

Modified Files:
        src/sys/kern: subr_pool.c

Log Message:
Don't release the lock in the PR_NOWAIT allocation. Move flags setting
after the acquiring the mutex. (from Tobias Nygren)
...
@@ -1088,10 +1088,11 @@ pool_grow(struct pool *pp, int flags)
 		}
 	}
 	pp->pr_flags |= PR_GROWING;
-	if ((flags & PR_WAITOK) == 0)
+	if (flags & PR_WAITOK)
+		mutex_exit(&pp->pr_lock);
+	else
 		pp->pr_flags |= PR_GROWINGNOWAIT;
 
-	mutex_exit(&pp->pr_lock);
 	char *cp = pool_allocator_alloc(pp, flags);
 	if (__predict_false(cp == NULL))
 		goto out;

https://mail-index.NetBSD.org/source-changes/2017/12/29/msg090787.html


Can you try reverting that change and see what happens?

Unless we have an explanation of the motivation for that change, I
think it is wrong, precisely because of the deadlock here: the pool
lock MUST be released around the drain hook.


The surrounding PR_GROWINGNOWAIT logic -- which you recently
half-deleted, though I have asked to restore it until we have
adequately analyzed the situation and history
(https://mail-index.NetBSD.org/tech-kern/2025/05/16/msg030467.html) --
had been introduced two weeks prior by mrg@, and apparently reviewed by
me (although I have no memory of it), but I can't find any record of
the symptoms this change was supposed to fix:


Module Name:    src
Committed By:   mrg
Date:           Sat Dec 16 03:13:29 UTC 2017

Modified Files:
        src/sys/kern: subr_pool.c
        src/sys/sys: pool.h

Log Message:
hopefully workaround the irregularly "fork fails in init" problem.

if a pool is growing, and the grower is PR_NOWAIT, mark this.
if another caller wants to grow the pool and is also PR_NOWAIT,
busy-wait for the original caller, which should either succeed
or hard-fail fairly quickly.

implement the busy-wait by unlocking and relocking this pools
mutex and returning ERESTART.  other methods (such as having
the caller do this) were significantly more code and this hack
is fairly localised.

ok chs@ riastradh@
...
@@ -1073,10 +1073,23 @@ pool_grow(struct pool *pp, int flags)
 			} while (pp->pr_flags & PR_GROWING);
 			return ERESTART;
 		} else {
+			if (pp->pr_flags & PR_GROWINGNOWAIT) {
+				/*
+				 * This needs an unlock/relock dance so
+				 * that the other caller has a chance to
+				 * run and actually do the thing.  Note
+				 * that this is effectively a busy-wait.
+				 */
+				mutex_exit(&pp->pr_lock);
+				mutex_enter(&pp->pr_lock);
+				return ERESTART;
+			}
 			return EWOULDBLOCK;
 		}
 	}
 	pp->pr_flags |= PR_GROWING;
+	if ((flags & PR_WAITOK) == 0)
+		pp->pr_flags |= PR_GROWINGNOWAIT;
 
 	mutex_exit(&pp->pr_lock);
 	char *cp = pool_allocator_alloc(pp, flags);

https://mail-index.NetBSD.org/source-changes/2017/12/16/msg090490.html





Home | Main Index | Thread Index | Old Index