NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/59339: heartbeat watchdog fires since 10.99.14



If you catch a core dump with a stack trace in itimer_transition, can
you print all the locals and parameters in gdb?

I believe this all started happening shortly after
https://mail-index.NetBSD.org/source-changes/2025/04/01/msg156191.html
and all the stack traces have been in callout processing, including
one in the itimer_transition subroutine which I rewrote in that
commit, so it's likely to be related.

My guess is that there's some itimer parameter, or bad timing, which
causes itimer_transition to do something foolish -- that or the
conversion from the next timeout as a timespec to a duration in ticks,
in tshztoup/tshzto, is going wrong.

Can you please try the attached patch to check for some possible types
of foolishness earlier?  If this confirms my hypothesis, it will turn
a silent lockup into a panic.
# HG changeset patch
# User Taylor R Campbell <riastradh%NetBSD.org@localhost>
# Date 1746748216 0
#      Thu May 08 23:50:16 2025 +0000
# Branch trunk
# Node ID a18d30166c2594b955c89dab60a2e459a49d39c0
# Parent  5a05c6d6e6c147293aba92413649d75270baf356
# EXP-Topic riastradh-pr59339-heartbeat
WIP: Assert that itimer is scheduled for future.

PR kern/59339: heartbeat watchdog fires since 10.99.14

diff -r 5a05c6d6e6c1 -r a18d30166c25 sys/kern/kern_time.c
--- a/sys/kern/kern_time.c	Tue May 06 23:18:37 2025 +0000
+++ b/sys/kern/kern_time.c	Thu May 08 23:50:16 2025 +0000
@@ -821,6 +821,7 @@ itimer_decr(struct itimer *it, int nsec)
 static void
 itimer_arm_real(struct itimer * const it)
 {
+	int ticks;
 
 	KASSERT(!it->it_dying);
 	KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
@@ -830,10 +831,15 @@ itimer_arm_real(struct itimer * const it
 	 * Don't need to check tshzto() return value, here.
 	 * callout_schedule() does it for us.
 	 */
-	callout_schedule(&it->it_ch,
-	    (it->it_clockid == CLOCK_MONOTONIC
-		? tshztoup(&it->it_time.it_value)
-		: tshzto(&it->it_time.it_value)));
+	ticks = (it->it_clockid == CLOCK_MONOTONIC
+	    ? tshztoup(&it->it_time.it_value)
+	    : tshzto(&it->it_time.it_value));
+	KASSERTMSG(ticks > 0, "[%u] it->it_time.it_value=%lld.%09ld ticks=%d",
+	    it->it_clockid,
+	    (long long)it->it_time.it_value.tv_sec,
+	    (long)it->it_time.it_value.tv_nsec,
+	    ticks);
+	callout_schedule(&it->it_ch, ticks);
 }
 
 /*
@@ -872,6 +878,18 @@ itimer_callout(void *arg)
 	 * now, compute the next itimer value and count overruns.
 	 */
 	itimer_transition(&it->it_time, &now, &next, &overruns);
+	KASSERTMSG(timespeccmp(&now, &next, <),
+	    "[clock %u]"
+	    " it->it_time.it_value=%lld.%09ld"
+	    " it->it_time.it_interval=%lld.%09ld"
+	    " now=%lld.%09ld next=%lld.%09ld",
+	    it->it_clockid,
+	    (long long)it->it_time.it_value.tv_sec,
+	    (long)it->it_time.it_value.tv_nsec,
+	    (long long)it->it_time.it_interval.tv_sec,
+	    (long)it->it_time.it_interval.tv_nsec,
+	    (long long)now.tv_sec, (long)now.tv_nsec,
+	    (long long)next.tv_sec, (long)next.tv_nsec);
 	it->it_time.it_value = next;
 	it->it_overruns += overruns;
 


Home | Main Index | Thread Index | Old Index