NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/59339: heartbeat watchdog fires since 10.99.14



The following reply was made to PR kern/59339; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Patrick Welche <prlw1%welche.eu@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost, wiz%NetBSD.org@localhost
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Fri, 9 May 2025 00:03:52 +0000

 This is a multi-part message in MIME format.
 --=_3KgWU08dsAmnyfg6ET7fg3RAcRLpOHQE
 
 If you catch a core dump with a stack trace in itimer_transition, can
 you print all the locals and parameters in gdb?
 
 I believe this all started happening shortly after
 https://mail-index.NetBSD.org/source-changes/2025/04/01/msg156191.html
 and all the stack traces have been in callout processing, including
 one in the itimer_transition subroutine which I rewrote in that
 commit, so it's likely to be related.
 
 My guess is that there's some itimer parameter, or bad timing, which
 causes itimer_transition to do something foolish -- that or the
 conversion from the next timeout as a timespec to a duration in ticks,
 in tshztoup/tshzto, is going wrong.
 
 Can you please try the attached patch to check for some possible types
 of foolishness earlier?  If this confirms my hypothesis, it will turn
 a silent lockup into a panic.
 
 --=_3KgWU08dsAmnyfg6ET7fg3RAcRLpOHQE
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr59339-itimercalloutassert"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr59339-itimercalloutassert.patch"
 
 # HG changeset patch
 # User Taylor R Campbell <riastradh%NetBSD.org@localhost>
 # Date 1746748216 0
 #      Thu May 08 23:50:16 2025 +0000
 # Branch trunk
 # Node ID a18d30166c2594b955c89dab60a2e459a49d39c0
 # Parent  5a05c6d6e6c147293aba92413649d75270baf356
 # EXP-Topic riastradh-pr59339-heartbeat
 WIP: Assert that itimer is scheduled for future.
 
 PR kern/59339: heartbeat watchdog fires since 10.99.14
 
 diff -r 5a05c6d6e6c1 -r a18d30166c25 sys/kern/kern_time.c
 --- a/sys/kern/kern_time.c	Tue May 06 23:18:37 2025 +0000
 +++ b/sys/kern/kern_time.c	Thu May 08 23:50:16 2025 +0000
 @@ -821,6 +821,7 @@ itimer_decr(struct itimer *it, int nsec)
  static void
  itimer_arm_real(struct itimer * const it)
  {
 +	int ticks;
 =20
  	KASSERT(!it->it_dying);
  	KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
 @@ -830,10 +831,15 @@ itimer_arm_real(struct itimer * const it
  	 * Don't need to check tshzto() return value, here.
  	 * callout_schedule() does it for us.
  	 */
 -	callout_schedule(&it->it_ch,
 -	    (it->it_clockid =3D=3D CLOCK_MONOTONIC
 -		? tshztoup(&it->it_time.it_value)
 -		: tshzto(&it->it_time.it_value)));
 +	ticks =3D (it->it_clockid =3D=3D CLOCK_MONOTONIC
 +	    ? tshztoup(&it->it_time.it_value)
 +	    : tshzto(&it->it_time.it_value));
 +	KASSERTMSG(ticks > 0, "[%u] it->it_time.it_value=3D%lld.%09ld ticks=3D%d",
 +	    it->it_clockid,
 +	    (long long)it->it_time.it_value.tv_sec,
 +	    (long)it->it_time.it_value.tv_nsec,
 +	    ticks);
 +	callout_schedule(&it->it_ch, ticks);
  }
 =20
  /*
 @@ -872,6 +878,18 @@ itimer_callout(void *arg)
  	 * now, compute the next itimer value and count overruns.
  	 */
  	itimer_transition(&it->it_time, &now, &next, &overruns);
 +	KASSERTMSG(timespeccmp(&now, &next, <),
 +	    "[clock %u]"
 +	    " it->it_time.it_value=3D%lld.%09ld"
 +	    " it->it_time.it_interval=3D%lld.%09ld"
 +	    " now=3D%lld.%09ld next=3D%lld.%09ld",
 +	    it->it_clockid,
 +	    (long long)it->it_time.it_value.tv_sec,
 +	    (long)it->it_time.it_value.tv_nsec,
 +	    (long long)it->it_time.it_interval.tv_sec,
 +	    (long)it->it_time.it_interval.tv_nsec,
 +	    (long long)now.tv_sec, (long)now.tv_nsec,
 +	    (long long)next.tv_sec, (long)next.tv_nsec);
  	it->it_time.it_value =3D next;
  	it->it_overruns +=3D overruns;
 =20
 
 --=_3KgWU08dsAmnyfg6ET7fg3RAcRLpOHQE--
 


Home | Main Index | Thread Index | Old Index