kern/59692: itimer(9): possible infinite loop

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/59692: itimer(9): possible infinite loop
From: campbell+netbsd%mumble.net@localhost
Date: Sun, 5 Oct 2025 14:45:01 +0000 (UTC)

>Number:         59692
>Category:       kern
>Synopsis:       itimer(9): possible infinite loop
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 05 14:45:01 +0000 2025
>Originator:     Taylor R Campbell
>Release:        current, 11, 10, 9, ...
>Organization:
The Clock of the Long BSD Foundation
>Environment:
>Description:

	The algorithm run periodically for an itimer is roughly as
	follows:

	1. Check the clock, call it `now', with getnano(up)time.

	2. Based on the time it previously fired, `last', and on `now',
	   (a) how many overruns there have been, and
	   (b) at what struct timespec the itimer should fire next,
	       call it `next'.

	3. Check the clock again, call it `now1'.

	4. callout_schedule(tstohz(next - now1)).

	Source references:

	https://nxr.netbsd.org/xref/src/sys/kern/kern_time.c?r=1.228#839
	https://nxr.netbsd.org/xref/src/sys/kern/kern_time.c?r=1.228#816
	https://nxr.netbsd.org/xref/src/sys/kern/subr_time.c?r=1.41#65

	Note that, barring clock changes, last < now < MIN(now1, next).
	However, it is not guaranteed that now1 < next.  If the time to
	compute step 2 is large enough (and this may be unpredictably
	delayed by interrupt processing -- possibly even by scheduling,
	if softints are preemptible), it may happen that next < now1.

	In this case, where next < now1, tstohz will return 0 and
	callout_schedule will schedule the callout immediately, after
	zero ticks.

	If this happens repeatedly, the kernel will loop through
	callout processing without returning callout_softclock, the
	callout softint handler at IPL_SOFTCLOCK.  This can lead to
	heartbeat panics, which check the progress of a different
	softint handler at IPL_SOFTCLOCK.

	I hypothesize this phenomenon may be one of the two causes of
	the following PRs:

	PR kern/59339: heartbeat watchdog fires since 10.99.14
	https://gnats.NetBSD.org/59339

	PR kern/59465: Recurring kernel panic with -current (10.99.14):
	"heart stopped beating"
	https://gnats.NetBSD.org/59465

	PR kern/59679: Multiple "heart stopped beating" / "softints
	stuck for 16 seconds" panics
	https://gnats.NetBSD.org/59679


>How-To-Repeat:

	haven't found a compact reproducer


>Fix:

	Several options:

	1. Change tvtohz/tstohz to map 0 sec to 1 tick.

	   Currently these map any timeout <=0 to 0 tick, and any
	   timeout >0 to >=2 tick -- this is because we don't know how
	   far away the next hardclock interrupt will be, and since it
	   could be arbitrarily small, we have to wait for two
	   hardclock interrupts to guarantee having waited at least
	   some nonzero time.

	   But, perhaps we should map any timeout <0 to 0 tick, and a
	   zero timeout to 1 tick.  This will always wait some nonzero
	   duration, but there is no lower bound to how short that
	   duration will be before the next hardclock interrupt: For a
	   kernel running at hardclock frequency f, this will fire at
	   somewhere in the interval (0, (1 sec)/f] -- that is, it will
	   fire at the _next_ hardclock interrupt, not immediately.

	2. Stop sampling now1 and use tstohz(next - now) instead.

	   This may drift a little bit, but the itimer transition logic
	   will account for that the next time it fires -- it may
	   report an overrun, but it always aims for for times value +
	   k*interval for integer k.

	   It may also be a little cheaper, particularly when we adopt
	   high-resolution timers for which we will really want to use
	   the preciser but costlier nano(up)time which query the
	   timecounter directly rather than the coarse and cheap
	   getnano(up)time which just read the last hardclock
	   interrupt's cache of the timecounter.

	   This is what the v6 patch at https://gnats.NetBSD.org/59339
	   https://mail-index.NetBSD.org/netbsd-bugs/2025/05/12/msg089000.html
	   does (in addition to fixing PR kern/59691: tstohz(9) fails
	   to round up on some inputs).

	3. Change callout_softclock to grab a batch of callouts and
	   process that before returning.

	   This way other softints at IPL_SOFTCLOCK have a chance to
	   run even if more callouts are scheduled while
	   callout_softclock is running: if a new callout is scheduled,
	   it will be handled in the _next_ batch.

	   We do this for other kinds of queued processing like
	   workqueues:

    174 	for (;;) {
    175 		struct workqhead tmp;
    176 
    177 		SIMPLEQ_INIT(&tmp);
    178 
    179 		while (SIMPLEQ_EMPTY(&q->q_queue_pending))
    180 			cv_wait(&q->q_cv, &q->q_mutex);
    181 		SIMPLEQ_CONCAT(&tmp, &q->q_queue_pending);
    182 		SIMPLEQ_INIT(&q->q_queue_pending);
    183 
    184 		/*
    185 		 * Mark the queue as actively running a batch of work
    186 		 * by setting the generation number odd.
    187 		 */
    188 		q->q_gen |= 1;
    189 		mutex_exit(&q->q_mutex);
    190 
    191 		workqueue_runlist(wq, &tmp);
    192 
    193 		/*
    194 		 * Notify workqueue_wait that we have completed a batch
    195 		 * of work by incrementing the generation number.
    196 		 */
    197 		mutex_enter(&q->q_mutex);
    198 		KASSERTMSG(q->q_gen & 1, "q=%p gen=%"PRIu64, q, q->q_gen);
    199 		q->q_gen++;
    200 		cv_broadcast(&q->q_cv);
    201 	}

	   https://nxr.netbsd.org/xref/src/sys/kern/subr_workqueue.c?r=1.48#174

	   For callout(9), the operation corresponding to
	   SIMPLEQ_CONCAT is called CIRCQ_APPEND (with arguments
	   reversed).

Prev by Date: Re: kern/59691 (tstohz(9) fails to round up on some inputs)
Next by Date: Re: kern/59678 ("ring->queued > 0" panic from utwn)
Previous by Thread: Re: kern/59691 (tstohz(9) fails to round up on some inputs)
Next by Thread: Re: kern/59692: itimer(9): possible infinite loop
Indexes:

Home | Main Index | Thread Index | Old Index