NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/51632: Fix a race condition of low priority xcall
>Number: 51632
>Category: kern
>Synopsis: Fix a race condition of low priority xcall
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Nov 17 01:50:00 +0000 2016
>Originator: Ryota Ozaki
>Release: 6, 7, -current (since xcall appeared)
>Organization:
IIJ
>Environment:
NetBSD kvm 7.99.42 NetBSD 7.99.42 (KVM) #456: Wed Nov 16 17:57:19 JST 2016 ozaki-r@rangeley:(hidden) amd64
>Description:
xc_lowpri and xc_thread are racy and xc_wait may return during/before
executing all xcall callbacks, resulting in a kernel panic at worst.
xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall callbacks
are done, xc_wait returns and also xc_lowpri accepts a next job.
The problem is that a counter that counts the number of finished xcall
callbacks is incremented *before* actually executing a xcall callback
(see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
all xcall callbacks complete and a next job begins to run its xcall callbacks.
Even worse the counter is global and shared between jobs, so if a xcall
callback of the next job completes, the shared counter is incremented,
which confuses wc_wait of the previous job as all xcall callbacks of the
previous job are done and wc_wait of the previous job returns
during/before executing its xcall callbacks.
In psref_target_destroy case, arguments of a xcall callback are local
variables of a function that calls xc_broadcast and xc_wait. So early
return of xc_wait (auto-)deallocates the variables, which leads dangling
dereferences by a xcall callback resulting in say a kernel panic.
One example of kernel panicks is:
panic: kernel diagnostic assertion "(target->prt_class == class)" failed: file "(hidden)/sys/kern/subr_psref.c", line 485 mismatched psref target class: 0x0 (ref) != 0x2 (expected)
>How-To-Repeat:
I encountered the issue with a modified kernel that introduces
psref_target_destroy, which uses low priority xcall, for rtentries. It allows
parallel executions of psref_target_destroy for a rtentry and an ifaddr.
Nonetheless the issue theoretically happens if users of low priority xcall
for any targets run in parallel.
I can reproduce the issue by letting destructions of a rtentry and a
ifaddr occur in parallel, for example by the following steps:
- boot a kernel (modified) with NET_MPSAFE enabled
- setup IP forwarding
- send traffic over the forwarding path
- repeat assigning and deassigning IP addresses on the interfaces
- wait for several minutes
>Fix:
There are two counters that count the number of finished xcall callbacks for low priority xcall
for historical reasons (I guess): xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is
incremented correctly while xc_tailp is incremented wrongly, i.e., before executing a xcall
callback.
We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.
diff --git a/sys/kern/subr_xcall.c b/sys/kern/subr_xcall.c
index fb4630f..77996d6 100644
--- a/sys/kern/subr_xcall.c
+++ b/sys/kern/subr_xcall.c
@@ -105,7 +105,6 @@ typedef struct {
/* Low priority xcall structures. */
static xc_state_t xc_low_pri __cacheline_aligned;
-static uint64_t xc_tailp __cacheline_aligned;
/* High priority xcall structures. */
static xc_state_t xc_high_pri __cacheline_aligned;
@@ -134,7 +133,6 @@ xc_init(void)
memset(xclo, 0, sizeof(xc_state_t));
mutex_init(&xclo->xc_lock, MUTEX_DEFAULT, IPL_NONE);
cv_init(&xclo->xc_busy, "xclocv");
- xc_tailp = 0;
memset(xchi, 0, sizeof(xc_state_t));
mutex_init(&xchi->xc_lock, MUTEX_DEFAULT, IPL_SOFTSERIAL);
@@ -256,7 +254,7 @@ xc_lowpri(xcfunc_t func, void *arg1, void *arg2, struct cpu_info *ci)
uint64_t where;
mutex_enter(&xc->xc_lock);
- while (xc->xc_headp != xc_tailp) {
+ while (xc->xc_headp != xc->xc_donep) {
cv_wait(&xc->xc_busy, &xc->xc_lock);
}
xc->xc_arg1 = arg1;
@@ -277,7 +275,7 @@ xc_lowpri(xcfunc_t func, void *arg1, void *arg2, struct cpu_info *ci)
ci->ci_data.cpu_xcall_pending = true;
cv_signal(&ci->ci_data.cpu_xcall);
}
- KASSERT(xc_tailp < xc->xc_headp);
+ KASSERT(xc->xc_donep < xc->xc_headp);
where = xc->xc_headp;
mutex_exit(&xc->xc_lock);
@@ -302,7 +300,7 @@ xc_thread(void *cookie)
mutex_enter(&xc->xc_lock);
for (;;) {
while (!ci->ci_data.cpu_xcall_pending) {
- if (xc->xc_headp == xc_tailp) {
+ if (xc->xc_headp == xc->xc_donep) {
cv_broadcast(&xc->xc_busy);
}
cv_wait(&ci->ci_data.cpu_xcall, &xc->xc_lock);
@@ -312,7 +310,6 @@ xc_thread(void *cookie)
func = xc->xc_func;
arg1 = xc->xc_arg1;
arg2 = xc->xc_arg2;
- xc_tailp++;
mutex_exit(&xc->xc_lock);
KASSERT(func != NULL);
Home |
Main Index |
Thread Index |
Old Index