Re: kern/57226: NetBSD 10_BETA kernel crash

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,taca%back-street.net@localhost
Subject: Re: kern/57226: NetBSD 10_BETA kernel crash
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Sun, 25 Jun 2023 18:40:01 +0000 (UTC)
The following reply was made to PR kern/57226; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: PHO <pho%cielonegro.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/57226: NetBSD 10_BETA kernel crash
Date: Sun, 25 Jun 2023 18:37:12 +0000

 This is a multi-part message in MIME format.
 --=_QSVhpQW7wL2t2ZyM0lNb6kI+m4HN6cIl
 Content-Transfer-Encoding: quoted-printable
 
 > Date: Mon, 26 Jun 2023 00:10:04 +0900
 > From: PHO <pho%cielonegro.org@localhost>
 >=20
 > But what I found with a bunch of debugging logs is that:
 >=20
 > 1. cpu0 calls timerfd_create() to initialize an itimer.
 >=20
 > 2. The same CPU calls timerfd_settime() to schedule the timer.
 >=20
 > 3. After some ticks itimer_callout() gets invoked. It re-schedules the=20
 > callout because it's not dying yet.
 >=20
 > 4. itimer_callout() releases the itimer lock right before returning.
 >=20
 > 5. But before itimer_callout() returns and callout_softclock()=20
 > re-acquires the callout lock, cpu1 calls timerfd_destroy().
 >=20
 > 6. cpu1 can acquire the itimer lock (because it's now released), and=20
 > calls itimer_poison(). It marks the itimer as dying, but it's too late.=20
 > itimer_poison() invokes callout_halt(), which returns without waiting=20
 > for the callout to finish because c->c_flags no longer contains=20
 > CALLOUT_FIRED. The flag has been cleared on the step 3.
 >=20
 > 7. cpu1 then calls itimer_fini(), which in turn calls callout_destroy().=
 =20
 > cpu0 is still in callout_softclock() and trying to lock the callout,=20
 > which makes cpu1 panic.
 
 Ah!  I understand.  The problem is that there is a window between:
 
 /* itimer_callout */
 	if (!it->it_dying)
 		callout_schedule(&it->it_ch, ...);	/* (A) */
 	itimer_unlock(it);
 	return;
 
 /* callout_softclock */
 	cc->cc_active =3D NULL;				/* (B) */
 
 During that time, between (A) and (B), it->it_ch doesn't have
 CALLOUT_FIRED set because callout_schedule clears it at (A).
 
 So if another CPU calls itimer_poison and then itimer_fini,
 itimer_poison won't wait for (B) to have happened, and itimer_fini can
 run before (B), in which case callout_destroy crashes.
 
 The bug is that callout_halt fails to wait for (B).
 
 Nice work!  I agree this is the right analysis and correct solution.
 
 Couple nits:
 
 > callout(9): Obtain the lock before checking if a callout is destroyable
 >=20
 > One cannot safely observe the state of a callout without holding its lock,
 > because c->c_flags and c->c_cpu might be getting mutated somewhere else.
 
 Did you observe this to be necessary to fix the problem, or do you
 have a theory for why this should be necessary?  I don't think it
 should be.  I think if anything, it should be under #ifdef DEBUG (but
 I'm not sure we should keep it at all).  The comment you deleted in
 this change,
 
 	/*
 	 * It's not necessary to lock in order to see the correct value
 	 * of c->c_flags.  If the callout could potentially have been
 	 * running, the current thread should have stopped it.
 	 */
 
 correctly reflects the callout(9) API contract, so if there is any
 race here, it is either a bug in callout(9) or a user error -- and we
 want to detect it and fail noisily in either case, not suppress it by
 taking a lock.
 
 In the other change, which I think is the only one that should be
 needed to solve the problem:
 
 > @@ -593,7 +592,7 @@ callout_wait(callout_impl_t *c, void *interlock, kmut=
 ex_t *lock)
 >  		 * - the callout itself has called callout_halt() (nice!)
 >  		 */
 >  		cc =3D c->c_cpu;
 
 I think you need to make sure to load c->c_cpu exactly once here, not
 once in callout_wait and then once again in
 callout_running_somewhere_else.
 
 It's possible that one can prove it is OK to reload c->c_cpu twice,
 but let's keep the change minimal.
 
 In fact for now I would suggest just inlining it in callout_halt so
 that the fix is a one-line change to start, and to pull up, and then
 we can tidy it up afterward with no-functional-change commits.
 
 > -		if (__predict_true(cc->cc_active !=3D c || cc->cc_lwp =3D=3D l))
 > +		if (__predict_false(!callout_running_somewhere_else(c)))
 
 This should stay __predict_true.  Callouts are supposed to be quick,
 so by the time we've reloaded c->c_cpu from memory, it may have
 changed again, even though the caller already determined
 callout_running_somewhere_else returned true.
 
 (Attached are the changes I reviewed, for audit trail.)
 
 --=_QSVhpQW7wL2t2ZyM0lNb6kI+m4HN6cIl
 Content-Type: text/plain; charset="ISO-8859-1"; name="736fd5151124eaf2b5de1e6b5f4f0d6c71edd445"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="736fd5151124eaf2b5de1e6b5f4f0d6c71edd445.patch"
 
 From 736fd5151124eaf2b5de1e6b5f4f0d6c71edd445 Mon Sep 17 00:00:00 2001
 From: PHO <pho%cielonegro.org@localhost>
 Date: Sun, 25 Jun 2023 20:15:44 +0900
 Subject: [PATCH] callout(9): Fix panic() in callout_destroy() (kern/57226)
 
 The culprit was callout_halt(). "(c->c_flags & CALLOUT_FIRED) !=3D 0" wasn't
 the correct way to check if a callout is running. It failed to wait for a
 running callout to finish in the following scenario:
 
 1. cpu0 initializes a callout and schedules it.
 2. cpu0 invokes callout_softlock() and fires the callout, setting the flag
    CALLOUT_FIRED.
 3. The callout invokes callout_schedule() to re-schedule itself.
 4. callout_schedule_locked() clears the flag CALLOUT_FIRED, and releases
    the lock.
 5. Before the lock is re-acquired by callout_softlock(), cpu1 decides to
    destroy the callout. It first invokes callout_halt() to make sure the
    callout finishes running.
 6. But since CALLOUT_FIRED has been cleared, callout_halt() thinks it's not
    running and therefore returns without invoking callout_wait().
 7. cpu1 proceeds to invoke callout_destroy() while it's still running on
    cpu0. callout_destroy() detects that and panics.
 ---
  sys/kern/kern_timeout.c | 26 +++++++++++++++++++-------
  1 file changed, 19 insertions(+), 7 deletions(-)
 
 diff --git a/sys/kern/kern_timeout.c b/sys/kern/kern_timeout.c
 index 8c2b180c6dae..210f24dc942b 100644
 --- a/sys/kern/kern_timeout.c
 +++ b/sys/kern/kern_timeout.c
 @@ -191,6 +191,7 @@ static struct callout_cpu ccb;
  #ifndef CRASH /* _KERNEL */
  static void	callout_softclock(void *);
  static void	callout_wait(callout_impl_t *, void *, kmutex_t *);
 +static bool	callout_running_somewhere_else(callout_impl_t *);
 =20
  static struct callout_cpu callout_cpu0 __cacheline_aligned;
  static void *callout_sih __read_mostly;
 @@ -375,7 +376,7 @@ callout_destroy(callout_t *cs)
  	KASSERTMSG((c->c_flags & CALLOUT_PENDING) =3D=3D 0,
  	    "pending callout %p: c_func (%p) c_flags (%#x) destroyed from %p",
  	    c, c->c_func, c->c_flags, __builtin_return_address(0));
 -	KASSERTMSG(c->c_cpu->cc_lwp =3D=3D curlwp || c->c_cpu->cc_active !=3D c,
 +	KASSERTMSG(!callout_running_somewhere_else(c),
  	    "running callout %p: c_func (%p) c_flags (%#x) destroyed from %p",
  	    c, c->c_func, c->c_flags, __builtin_return_address(0));
  	c->c_magic =3D 0;
 @@ -543,7 +544,6 @@ callout_halt(callout_t *cs, void *interlock)
  {
  	callout_impl_t *c =3D (callout_impl_t *)cs;
  	kmutex_t *lock;
 -	int flags;
 =20
  	KASSERT(c->c_magic =3D=3D CALLOUT_MAGIC);
  	KASSERT(!cpu_intr_p());
 @@ -553,11 +553,10 @@ callout_halt(callout_t *cs, void *interlock)
  	lock =3D callout_lock(c);
  	SDT_PROBE4(sdt, kernel, callout, halt,
  	    c, c->c_func, c->c_arg, c->c_flags);
 -	flags =3D c->c_flags;
 -	if ((flags & CALLOUT_PENDING) !=3D 0)
 +	if ((c->c_flags & CALLOUT_PENDING) !=3D 0)
  		CIRCQ_REMOVE(&c->c_list);
 -	c->c_flags =3D flags & ~(CALLOUT_PENDING|CALLOUT_FIRED);
 -	if (__predict_false(flags & CALLOUT_FIRED)) {
 +	c->c_flags &=3D ~(CALLOUT_PENDING|CALLOUT_FIRED);
 +	if (__predict_false(callout_running_somewhere_else(c))) {
  		callout_wait(c, interlock, lock);
  		return true;
  	}
 @@ -593,7 +592,7 @@ callout_wait(callout_impl_t *c, void *interlock, kmutex=
 _t *lock)
  		 * - the callout itself has called callout_halt() (nice!)
  		 */
  		cc =3D c->c_cpu;
 -		if (__predict_true(cc->cc_active !=3D c || cc->cc_lwp =3D=3D l))
 +		if (__predict_false(!callout_running_somewhere_else(c)))
  			break;
 =20
  		/* It's running - need to wait for it to complete. */
 @@ -770,6 +769,19 @@ callout_ack(callout_t *cs)
  	mutex_spin_exit(lock);
  }
 =20
 +/*
 + * Check if the callout is currently running on an LWP that isn't curlwp.
 + */
 +static bool
 +callout_running_somewhere_else(callout_impl_t *c)
 +{
 +	struct callout_cpu *cc =3D c->c_cpu;
 +
 +	KASSERT(mutex_owned(cc->cc_lock));
 +
 +	return cc->cc_active =3D=3D c && cc->cc_lwp !=3D curlwp;
 +}
 +
  /*
   * callout_hardclock:
   *
 
 --=_QSVhpQW7wL2t2ZyM0lNb6kI+m4HN6cIl
 Content-Type: text/plain; charset="ISO-8859-1"; name="dfe928e33245062241b936a0232d5f6b6a3e98a6"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="dfe928e33245062241b936a0232d5f6b6a3e98a6.patch"
 
 From dfe928e33245062241b936a0232d5f6b6a3e98a6 Mon Sep 17 00:00:00 2001
 From: PHO <pho%cielonegro.org@localhost>
 Date: Sun, 25 Jun 2023 20:28:31 +0900
 Subject: [PATCH] callout(9): Obtain the lock before checking if a callout is
  destroyable
 
 One cannot safely observe the state of a callout without holding its lock,
 because c->c_flags and c->c_cpu might be getting mutated somewhere else.
 ---
  sys/kern/kern_timeout.c | 17 +++++++++--------
  1 file changed, 9 insertions(+), 8 deletions(-)
 
 diff --git a/sys/kern/kern_timeout.c b/sys/kern/kern_timeout.c
 index 3f3338a67872..8c2b180c6dae 100644
 --- a/sys/kern/kern_timeout.c
 +++ b/sys/kern/kern_timeout.c
 @@ -364,17 +364,14 @@ void
  callout_destroy(callout_t *cs)
  {
  	callout_impl_t *c =3D (callout_impl_t *)cs;
 -
 +#if defined(KDTRACE_HOOKS) || defined(DIAGNOSTIC)
 +	kmutex_t *lock =3D callout_lock(c);
 +#endif
  	SDT_PROBE1(sdt, kernel, callout, destroy,  cs);
 =20
  	KASSERTMSG(c->c_magic =3D=3D CALLOUT_MAGIC,
  	    "callout %p: c_magic (%#x) !=3D CALLOUT_MAGIC (%#x)",
  	    c, c->c_magic, CALLOUT_MAGIC);
 -	/*
 -	 * It's not necessary to lock in order to see the correct value
 -	 * of c->c_flags.  If the callout could potentially have been
 -	 * running, the current thread should have stopped it.
 -	 */
  	KASSERTMSG((c->c_flags & CALLOUT_PENDING) =3D=3D 0,
  	    "pending callout %p: c_func (%p) c_flags (%#x) destroyed from %p",
  	    c, c->c_func, c->c_flags, __builtin_return_address(0));
 @@ -382,6 +379,10 @@ callout_destroy(callout_t *cs)
  	    "running callout %p: c_func (%p) c_flags (%#x) destroyed from %p",
  	    c, c->c_func, c->c_flags, __builtin_return_address(0));
  	c->c_magic =3D 0;
 +
 +#if defined(KDTRACE_HOOKS) || defined(DIAGNOSTIC)
 +	mutex_spin_exit(lock);
 +#endif
  }
 =20
  /*
 @@ -654,12 +655,12 @@ callout_bind(callout_t *cs, struct cpu_info *ci)
  	struct callout_cpu *cc;
  	kmutex_t *lock;
 =20
 -	KASSERT((c->c_flags & CALLOUT_PENDING) =3D=3D 0);
 -	KASSERT(c->c_cpu->cc_active !=3D c);
  	KASSERT(c->c_magic =3D=3D CALLOUT_MAGIC);
  	KASSERT((c->c_flags & CALLOUT_MPSAFE) !=3D 0);
 =20
  	lock =3D callout_lock(c);
 +	KASSERT(c->c_cpu->cc_active !=3D c);
 +	KASSERT((c->c_flags & CALLOUT_PENDING) =3D=3D 0);
  	cc =3D ci->ci_data.cpu_callout;
  	c->c_flags |=3D CALLOUT_BOUND;
  	if (c->c_cpu !=3D cc) {
 
 --=_QSVhpQW7wL2t2ZyM0lNb6kI+m4HN6cIl--
Prev by Date: Re: kern/57226: NetBSD 10_BETA kernel crash
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: kern/57226: NetBSD 10_BETA kernel crash
Next by Thread: Re: kern/57226: NetBSD 10_BETA kernel crash
Indexes:
Home | Main Index | Thread Index | Old Index