NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/59411 (deadlock on mbuf pool)
The following reply was made to PR kern/59411; it has been noted by GNATS.
From: Manuel Bouyer <manuel.bouyer%lip6.fr@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
riastradh%NetBSD.org@localhost, mrg%NetBSD.org@localhost, chs%NetBSD.org@localhost,
christos%NetBSD.org@localhost, tnn%NetBSD.org@localhost
Subject: Re: kern/59411 (deadlock on mbuf pool)
Date: Fri, 16 May 2025 11:50:07 +0200
On Fri, May 16, 2025 at 02:29:22AM +0000, riastradh%NetBSD.org@localhost wrote:
> [...]
> Until subr_pool.c rev. 1.220 in 2017, pool_grow dropped the lock around
> pool_allocator_alloc and thus avoided this deadlock, but it's unclear
> why that change was made:
>
>
> Module Name: src
> Committed By: christos
> Date: Fri Dec 29 16:13:26 UTC 2017
>
> Modified Files:
> src/sys/kern: subr_pool.c
>
> Log Message:
> Don't release the lock in the PR_NOWAIT allocation. Move flags setting
> after the acquiring the mutex. (from Tobias Nygren)
> ...
> @@ -1088,10 +1088,11 @@ pool_grow(struct pool *pp, int flags)
> }
> }
> pp->pr_flags |= PR_GROWING;
> - if ((flags & PR_WAITOK) == 0)
> + if (flags & PR_WAITOK)
> + mutex_exit(&pp->pr_lock);
> + else
> pp->pr_flags |= PR_GROWINGNOWAIT;
>
> - mutex_exit(&pp->pr_lock);
> char *cp = pool_allocator_alloc(pp, flags);
> if (__predict_false(cp == NULL))
> goto out;
>
> https://mail-index.NetBSD.org/source-changes/2017/12/29/msg090787.html
>
>
> Can you try reverting that change and see what happens?
>
> Unless we have an explanation of the motivation for that change, I
> think it is wrong, precisely because of the deadlock here: the pool
> lock MUST be released around the drain hook.
Testing this patch now:
Index: subr_pool.c
===================================================================
RCS file: /cvsroot/src/sys/kern/subr_pool.c,v
retrieving revision 1.285.4.2
diff -u -p -u -r1.285.4.2 subr_pool.c
--- subr_pool.c 15 Dec 2024 14:58:45 -0000 1.285.4.2
+++ subr_pool.c 16 May 2025 09:46:15 -0000
@@ -1411,11 +1411,10 @@ pool_grow(struct pool *pp, int flags)
}
}
pp->pr_flags |= PR_GROWING;
- if (flags & PR_WAITOK)
- mutex_exit(&pp->pr_lock);
- else
+ if ((flags & PR_WAITOK) == 0)
pp->pr_flags |= PR_GROWINGNOWAIT;
+ mutex_exit(&pp->pr_lock);
storage = pool_allocator_alloc(pp, flags);
if (__predict_false(storage == NULL))
goto out;
@@ -1426,8 +1425,7 @@ pool_grow(struct pool *pp, int flags)
goto out;
}
- if (flags & PR_WAITOK)
- mutex_enter(&pp->pr_lock);
+ mutex_enter(&pp->pr_lock);
pool_prime_page(pp, storage, ph);
pp->pr_npagealloc++;
KASSERT(pp->pr_flags & PR_GROWING);
@@ -1439,8 +1437,7 @@ pool_grow(struct pool *pp, int flags)
cv_broadcast(&pp->pr_cv);
return 0;
out:
- if (flags & PR_WAITOK)
- mutex_enter(&pp->pr_lock);
+ mutex_enter(&pp->pr_lock);
KASSERT(pp->pr_flags & PR_GROWING);
pp->pr_flags &= ~(PR_GROWING|PR_GROWINGNOWAIT);
return ENOMEM;
I will reboot the aftected server this evening; need to wait at last
monday evening to be sure the issue isn't back.
>
> The surrounding PR_GROWINGNOWAIT logic -- which you recently
> half-deleted, though I have asked to restore it until we have
> adequately analyzed the situation and history
> (https://mail-index.NetBSD.org/tech-kern/2025/05/16/msg030467.html) --
> had been introduced two weeks prior by mrg@, and apparently reviewed by
> me (although I have no memory of it), but I can't find any record of
> the symptoms this change was supposed to fix:
With the above patch, PR_GROWINGNOWAIT isn't a NOOP any more.
--
Manuel Bouyer, LIP6, Sorbonne Université. Manuel.Bouyer%lip6.fr@localhost
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index