Re: PSA: Clock drift and pkgin

To: Mouse <mouse%Rodents-Montreal.ORG@localhost>
Subject: Re: PSA: Clock drift and pkgin
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Sat, 23 Dec 2023 22:05:52 +0000

> Date: Sat, 23 Dec 2023 12:24:08 -0500 (EST)
> From: Mouse <mouse%Rodents-Montreal.ORG@localhost>
> 
> >>         } else if (sec <= (LONG_MAX / 1000000))
> >>                 ticks = (((sec * 1000000) + (unsigned long)usec + (tick - 1))
> >>                     / tick) + 1;
> 
> > Whether this is a bug depends on whether: [...]
> 
> I think that code is not a bug per se; for sleeps, that codepath
> is...well, "reasonable" at the very least.  The bug is that it is
> broken for timer reloads, but timer reloads are using it anyway;
> whether you think of this as a bug in timer reloads or a bug in tvtohz
> is a question of which way you prefer to squint your mind.

This was introduced in kern_clock.c 1.62 in July 2000 by thorpej,
specifically to avoid waking up sleeps early by up to one tick's worth
of time:

https://mail-index.netbsd.org/source-changes/2000/07/13/msg081191.html

Author: thorpej <thorpej%NetBSD.org@localhost>
Date:   Thu Jul 13 17:06:15 2000 +0000

    New hzto() function from FreeBSD and Artur Grabowski <art%stacken.kth.se@localhost>.
    Stops sleeps from returning early (by up to a clock tick), and return 0
    ticks for timeouts that should happen now or in the past.
    
    Returning 0 is different from the legacy hzto() interface, and callers
    need to check for it.

The attached (untested) patch reverts to the old algorithm
specifically for the case of rearming a periodic timer, leaving the
new algorithm with +1 in place for all other uses.

Now, it's unclear to me whether this is correct, because it can have
the following effect.  Suppose ticks happen on intervals of time T.
Then, owing to unpredictable and uncontrollable scheduling delays, the
following sequence of events may happen:

Time 0*T: timer_settime(.it_value = T, .it_interval = T), arms timer at 1*T
Time 1*T + 0.9*T: timer expires, rearms timer at 2*T
Time 2*T + 0.1*T: timer expires, rearms timer at 3*T

The duration between these consecutive expirations of the timer is
0.2*T, even though we asked for an interval of T.

Of course, the _average_ duration will be around T, but the _minimum_
duration is essentially zero.  POSIX clearly forbids a _one-shot_
timer, which is scheduled to expire after time T, to actually expire
after only time 0.2*T.  But the language in POSIX is unclear to me on
whether this is allowed for _periodic_ timers:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/timer_settime.html

   The reload value of the timer shall be set to the value specified
   by the it_interval member of value.  When a timer is armed with a
   non-zero it_interval, a periodic (or repetitive) timer is
   specified.

   Time values that are between two consecutive non-negative integer
   multiples of the resolution of the specified timer shall be rounded
   up to the larger multiple of the resolution.  Quantization error
   shall not cause the timer to expire earlier than the rounded time
   value.

   If the argument ovalue is not NULL, the timer_settime() function
   shall store, in the location referenced by ovalue, a value
   representing the previous amount of time before the timer would
   have expired, or zero if the timer was disarmed, together with the
   previous timer reload value.  Timers shall not expire before their
   scheduled time.

So on the one hand, I'm not sure it's correct to implement the timer
APIs (setitimer, timer_settime) with this patch.

On the other hand, if it's not correct to do that, I'm not sure
correct POSIX periodic timers can attain a target _average_ interval
between expirations -- only a target _minimum_, with an average that
is higher by necessity.

From bca90989210e59c45efee9b44d7575e91f053a07 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Sat, 23 Dec 2023 21:03:13 +0000
Subject: [PATCH] time(9): Avoid adding partial tick to periodic interval
 timers.

This way, configuring a periodic timer with period <=1/hz sec will
delay ~1/hz sec between consecutive firings.

This is different from one-shot sleeps of <=1/hz sec, which are still
scheduled two ticks away, giving a delay of [1/hz sec, 2/hz sec],
since there's no way to reliably sleep for at least a shorter
duration than 1/hz sec starting at an arbitrary time between ticks.

PR kern/43997
---
 sys/kern/kern_time.c | 24 ++++++++++++++-
 sys/kern/subr_time.c | 72 +++++++++++++++++++++++++++++++++++++++++---
 sys/sys/timevar.h    |  2 ++
 3 files changed, 92 insertions(+), 6 deletions(-)

diff --git a/sys/kern/kern_time.c b/sys/kern/kern_time.c
index 7bc39324e32e..5130ddd390f5 100644
--- a/sys/kern/kern_time.c
+++ b/sys/kern/kern_time.c
@@ -824,6 +824,28 @@ itimer_arm_real(struct itimer * const it)
 		: tshzto(&it->it_time.it_value)));
 }
 
+/*
+ * itimer_rearm_real:
+ *
+ *	Rearm a non-virtual timer.
+ */
+static void
+itimer_rearm_real(struct itimer * const it)
+{
+	int value_ticks, interval_ticks;
+
+	KASSERT(!it->it_dying);
+	KASSERT(!CLOCK_VIRTUAL_P(it->it_clockid));
+	KASSERT(!callout_pending(&it->it_ch));
+
+	value_ticks = (it->it_clockid == CLOCK_MONOTONIC
+	    ? tshztoup(&it->it_time.it_value)
+	    : tshzto(&it->it_time.it_value));
+	interval_ticks = tstohz_full(&it->it_time.it_interval);
+
+	callout_schedule(&it->it_ch, MIN(value_ticks, interval_ticks));
+}
+
 /*
  * itimer_callout:
  *
@@ -889,7 +911,7 @@ itimer_callout(void *arg)
 	 * Reset the callout, if it's not going away.
 	 */
 	if (!it->it_dying)
-		itimer_arm_real(it);
+		itimer_rearm_real(it);
 	itimer_unlock();
 }
 
diff --git a/sys/kern/subr_time.c b/sys/kern/subr_time.c
index 01ea801bfcfd..bbda910a1bb5 100644
--- a/sys/kern/subr_time.c
+++ b/sys/kern/subr_time.c
@@ -61,10 +61,11 @@ tvhzto(const struct timeval *tvp)
 }
 
 /*
- * Compute number of ticks in the specified amount of time.
+ * Compute the number of full tick periods starting from the last tick
+ * to the next tick after a delay of at least tv, clamped at INT_MAX.
  */
 int
-tvtohz(const struct timeval *tv)
+tvtohz_full(const struct timeval *tv)
 {
 	unsigned long ticks;
 	long sec, usec;
@@ -110,10 +111,10 @@ tvtohz(const struct timeval *tv)
 		ticks = 0;
 	} else if (sec <= (LONG_MAX / 1000000))
 		ticks = (((sec * 1000000) + (unsigned long)usec + (tick - 1))
-		    / tick) + 1;
+		    / tick);
 	else if (sec <= (LONG_MAX / hz))
 		ticks = (sec * hz) +
-		    (((unsigned long)usec + (tick - 1)) / tick) + 1;
+		    (((unsigned long)usec + (tick - 1)) / tick);
 	else
 		ticks = LONG_MAX;
 
@@ -123,6 +124,44 @@ tvtohz(const struct timeval *tv)
 	return ((int)ticks);
 }
 
+/*
+ * Compute number of ticks starting from any time between the last tick
+ * and the upcoming tick to the next tick after a delay of at least tv,
+ * clamped at INT_MAX.
+ *
+ * This is usually one more than tvtohz_full.  For example, suppose tv
+ * holds exactly 2/hz sec:
+ *
+ *	  tvtohz_full starts   tvtohz_full ends, ticks=2
+ *		|               |
+ *		V               V
+ *	<-------+-------+-------+-------+-------+------->
+ *		    ^                   ^
+ *		    |                   |
+ *		tvtohz starts        tvtohz ends, ticks=3
+ *
+ * tvtohz returns the number of ticks starting _now_ to wait for a
+ * delay of tv before firing a tick-based timer; tvtohz_full returns
+ * the number of ticks starting at the _last tick_ to wait for a delay
+ * of tv before firing a tick-based timer.
+ */
+int
+tvtohz(const struct timeval *tv)
+{
+	int ticks = tvtohz_full(tv);
+
+	if (ticks <= 0 || ticks == INT_MAX)
+		return ticks;
+
+	/*
+	 * We rounded up to a number of ticks, but the _first_ tick
+	 * might be shorter than 1/hz sec -- we don't know how long
+	 * before the next hardclock call happens.  So add one to
+	 * account for that, to ensure the delay is at least tv.
+	 */
+	return ticks + 1;
+}
+
 int
 tshzto(const struct timespec *tsp)
 {
@@ -146,7 +185,28 @@ tshztoup(const struct timespec *tsp)
 }
 
 /*
- * Compute number of ticks in the specified amount of time.
+ * Compute the number of full tick periods starting from the last tick
+ * to the next tick after a delay of at least ts, clamped at INT_MAX.
+ */
+int
+tstohz_full(const struct timespec *ts)
+{
+	struct timeval tv;
+
+	/*
+	 * usec has great enough resolution for hz, so convert to a
+	 * timeval and use tvtohz_full() above.
+	 *
+	 * XXX This leads to double-rounding -- down, then up.
+	 */
+	TIMESPEC_TO_TIMEVAL(&tv, ts);
+	return tvtohz_full(&tv);
+}
+
+/*
+ * Compute number of ticks starting from any time between the last tick
+ * and the upcoming tick to the next tick after a delay of at least ts,
+ * clamped at INT_MAX.
  */
 int
 tstohz(const struct timespec *ts)
@@ -156,6 +216,8 @@ tstohz(const struct timespec *ts)
 	/*
 	 * usec has great enough resolution for hz, so convert to a
 	 * timeval and use tvtohz() above.
+	 *
+	 * XXX This leads to double-rounding -- down, then up.
 	 */
 	TIMESPEC_TO_TIMEVAL(&tv, ts);
 	return tvtohz(&tv);
diff --git a/sys/sys/timevar.h b/sys/sys/timevar.h
index 0dac2c01dbbe..855e266cdb13 100644
--- a/sys/sys/timevar.h
+++ b/sys/sys/timevar.h
@@ -214,7 +214,9 @@ int	settimeofday1(const struct timeval *, bool,
 int	timer_create1(timer_t *, clockid_t, struct sigevent *, copyin_t,
 	    struct lwp *);
 int	tstohz(const struct timespec *);
+int	tstohz_full(const struct timespec *);
 int	tvtohz(const struct timeval *);
+int	tvtohz_full(const struct timeval *);
 int	inittimeleft(struct timespec *, struct timespec *);
 int	gettimeleft(struct timespec *, struct timespec *);
 void	timerupcall(struct lwp *);

Follow-Ups:
- Re: PSA: Clock drift and pkgin
  - From: Johnny Billquist
- Re: PSA: Clock drift and pkgin
  - From: Mouse

References:
- Re: PSA: Clock drift and pkgin
  - From: Mouse

Prev by Date: Re: PSA: Clock drift and pkgin
Next by Date: Re: PSA: Clock drift and pkgin
Previous by Thread: Re: PSA: Clock drift and pkgin
Next by Thread: Re: PSA: Clock drift and pkgin
Indexes:

Home | Main Index | Thread Index | Old Index