NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/58925: itimer(9) responds erratically to clock wound back



>Number:         58925
>Category:       kern
>Synopsis:       itimer(9) responds erratically to clock wound back
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Dec 21 15:40:01 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, 9, ...
>Organization:
The IntervalBSD Lostation
>Environment:
>Description:
Suppose you set an interval timer to fire every minute starting at 03:00:00 on the real-time clock.

Suppose the clock is wound back to 02:00:00 (because of initial ntp adjustment, not because of misguided summer time configuration, say).

Suppose when the underlying callout fires, the clock reads 02:00:01.

What clock time does the timer get rescheduled for?

(a) 02:01:00
(b) 02:01:36
(c) 02:01:52
(d) 02:02:00
(e) 03:01:00

I think the correct answer is (a) and all the other ones are wrong.  Unfortunately, what we currently implement -- assuming it is actually the wee hours of the morning of January 1st, 1970, in Reykjavik -- is (b).  And before we switched from microsecond-based calculations to nanosecond-based calculations, for a few months in 2008 what we implemented was (c).

Why did this happen?  It happens in the logic that tries to compute the end of the interval [it_value - k*it_interval, it_value - (k - 1)*it_interval] that the current time lies in (now_ns is the current time in nanoseconds since the epoch, last_val is it_time in nanoseconds since the epoch, and interval is it_interval in nanoseconds):

    839 	uint64_t last_val, next_val, interval, now_ns;
...
    876 		next_val = now_ns +
    877 		    (now_ns - last_val + interval - 1) % interval;
    878 
    879 		if (backwards)
    880 			next_val += interval;

https://nxr.netbsd.org/xref/src/sys/kern/kern_time.c?r=1.223#836

Here, now_ns = 7201*10^9, last_val = 10800*10^9, and interval = 60*10^9.  So now_ns - last_val = -3599*10^9, and now_ns - last_val + interval - 1 -3539*10^9 - 1.  So reducing modulo interval (with truncating division, as C uses) should give -59*10^9 - 1, right?

Except this is unsigned 64-bit arithmetic!  So the number we're reducing modulo interval is not -3539*10^9 - 1 but rather 2^64 - 3539*10^9 - 1 = 18446747612709551617, so the reduction gives 34709551615 ~ 34.7sec.  Add that to now_ns and interval and you get 7295709551615 or about 02:01:36.

Now there's another problem which is that this logic isn't quite right anyway.  Even if we switch it to signed arithmetic, we end up scheduling it at 7202*10^9 - 1 ~ 02:00:02, a nanosecond before than one second after it previously fired!  And there's a stray nanosecond fencepost.

Note that C division is truncating (round quotient toward zero), which means, e.g.:

-3599 % 60 = -59

rather than 1 as you would get with flooring division (round quotient toward negative infinity) or euclidean division (remainder is always nonnegative).  So the sign of the remainder, in the clock-wound-backwards case, is always negative -- it's what you have to add to the end of the current interval to get back to now.  Or, it's what you have to _subtract_ from now to get to the end of the current interval.

So I think what this should really be is not:

now_ns += interval + ((now_ns - last_ns + interval - 1) % interval);

but rather (with an adjustment for the nanosecond of fencepost error to account for the just-less-than-interval addend being divided so it is not discarded altogether by reduction):

now_ns -= ((now_ns - last_ns + interval - 1) % interval) + 1;

Of course, that's only for the clock-wound-backward case; for the clock-wound-forward case the sign has to be reversed.  So we end up with:

/*
 *          now [backwards]        overruns    now [forwards]
 *           |                      v    v      |
 * |--+----+-*--x----+----+----|----+----+----+-*--x----+-->
 *           \__/                               \__/
 *         remainder                          remainder
 *       (nonpositive)                      (nonnegative)
 */

remainder = ((now_ns - last_ns + interval - 1) % interval) + 1;

if (backwards) {
        now_ns -= remainder;
} else {
        now_ns += remainder;
        it->it_overruns += (now_ns - last_ns) % interval; /* XXX int overflow */
}
>How-To-Repeat:
1. set a periodic interval timer (setitimer, timer_settime, timerfd_settime)
2. wind the clock back (settimeofday, clock_settime, ntp_adjtime, ...)
>Fix:
As above -- but let's get some automatic tests for this arithmetic written down first!



Home | Main Index | Thread Index | Old Index