NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/58925: itimer(9) responds erratically to clock wound back
>Number: 58925
>Category: kern
>Synopsis: itimer(9) responds erratically to clock wound back
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Dec 21 15:40:01 +0000 2024
>Originator: Taylor R Campbell
>Release: current, 10, 9, ...
>Organization:
The IntervalBSD Lostation
>Environment:
>Description:
Suppose you set an interval timer to fire every minute starting at 03:00:00 on the real-time clock.
Suppose the clock is wound back to 02:00:00 (because of initial ntp adjustment, not because of misguided summer time configuration, say).
Suppose when the underlying callout fires, the clock reads 02:00:01.
What clock time does the timer get rescheduled for?
(a) 02:01:00
(b) 02:01:36
(c) 02:01:52
(d) 02:02:00
(e) 03:01:00
I think the correct answer is (a) and all the other ones are wrong. Unfortunately, what we currently implement -- assuming it is actually the wee hours of the morning of January 1st, 1970, in Reykjavik -- is (b). And before we switched from microsecond-based calculations to nanosecond-based calculations, for a few months in 2008 what we implemented was (c).
Why did this happen? It happens in the logic that tries to compute the end of the interval [it_value - k*it_interval, it_value - (k - 1)*it_interval] that the current time lies in (now_ns is the current time in nanoseconds since the epoch, last_val is it_time in nanoseconds since the epoch, and interval is it_interval in nanoseconds):
839 uint64_t last_val, next_val, interval, now_ns;
...
876 next_val = now_ns +
877 (now_ns - last_val + interval - 1) % interval;
878
879 if (backwards)
880 next_val += interval;
https://nxr.netbsd.org/xref/src/sys/kern/kern_time.c?r=1.223#836
Here, now_ns = 7201*10^9, last_val = 10800*10^9, and interval = 60*10^9. So now_ns - last_val = -3599*10^9, and now_ns - last_val + interval - 1 -3539*10^9 - 1. So reducing modulo interval (with truncating division, as C uses) should give -59*10^9 - 1, right?
Except this is unsigned 64-bit arithmetic! So the number we're reducing modulo interval is not -3539*10^9 - 1 but rather 2^64 - 3539*10^9 - 1 = 18446747612709551617, so the reduction gives 34709551615 ~ 34.7sec. Add that to now_ns and interval and you get 7295709551615 or about 02:01:36.
Now there's another problem which is that this logic isn't quite right anyway. Even if we switch it to signed arithmetic, we end up scheduling it at 7202*10^9 - 1 ~ 02:00:02, a nanosecond before than one second after it previously fired! And there's a stray nanosecond fencepost.
Note that C division is truncating (round quotient toward zero), which means, e.g.:
-3599 % 60 = -59
rather than 1 as you would get with flooring division (round quotient toward negative infinity) or euclidean division (remainder is always nonnegative). So the sign of the remainder, in the clock-wound-backwards case, is always negative -- it's what you have to add to the end of the current interval to get back to now. Or, it's what you have to _subtract_ from now to get to the end of the current interval.
So I think what this should really be is not:
now_ns += interval + ((now_ns - last_ns + interval - 1) % interval);
but rather (with an adjustment for the nanosecond of fencepost error to account for the just-less-than-interval addend being divided so it is not discarded altogether by reduction):
now_ns -= ((now_ns - last_ns + interval - 1) % interval) + 1;
Of course, that's only for the clock-wound-backward case; for the clock-wound-forward case the sign has to be reversed. So we end up with:
/*
* now [backwards] overruns now [forwards]
* | v v |
* |--+----+-*--x----+----+----|----+----+----+-*--x----+-->
* \__/ \__/
* remainder remainder
* (nonpositive) (nonnegative)
*/
remainder = ((now_ns - last_ns + interval - 1) % interval) + 1;
if (backwards) {
now_ns -= remainder;
} else {
now_ns += remainder;
it->it_overruns += (now_ns - last_ns) % interval; /* XXX int overflow */
}
>How-To-Repeat:
1. set a periodic interval timer (setitimer, timer_settime, timerfd_settime)
2. wind the clock back (settimeofday, clock_settime, ntp_adjtime, ...)
>Fix:
As above -- but let's get some automatic tests for this arithmetic written down first!
Home |
Main Index |
Thread Index |
Old Index