tech-kern: settimeofday() versus interval tim{ers,ing}

Subject: settimeofday() versus interval tim{ers,ing}
To: None <tech-kern@NetBSD.ORG>
From: Dennis Ferguson <dennis@jnx.com>
List: tech-kern
Date: 09/29/1996 19:10:35
I've been trying to get a kernel to the point where user-space programs
which are using kernel time and timer facilities can be made reliable
across time-of-day changes made by settimeofday().  The particular problems
I'm having are with a program which uses interval timers, and which also
uses gettimeofday() to try to maintain accurate, non-drifting callout
queues internally.

To see the problem with interval timers (documented by a comment in
kern/kern_time.c:sys_settimeofday() since 4.2BSD) run the following
program, measuring the time interval between beep's with your watch,
and change the time of day.  Note that you always get an inaccurate
interval one interrupt after the date change.  If you set the clock
back, say, half an hour the inaccurate interval will be half an hour
long, and any program waiting for an interval timer interrupt will
essentially hang.

#include <stdio.h>
#include <signal.h>
#include <sys/time.h>

#define	TIMEOUT	8		/* number of seconds to timeout */

void alarmed(int);

main(int argc, char **argv)
{
	struct itimerval itv;

	itv.it_interval.tv_sec = itv.it_value.tv_sec = TIMEOUT;
	itv.it_interval.tv_usec = itv.it_value.tv_usec = 0;

	signal(SIGALRM, alarmed);
	setitimer(ITIMER_REAL, &itv, (struct itimerval *) 0);

	for (;;) {
		(void) pause();
	}
}

void
alarmed(x)
	int x;
{
	printf("beep\n");
	fflush(stdout);
}

The problem with the interval timer implementation is sort of generic.  While
some uses of time really want the time of day (for example, to determine
file system timestamps), other uses don't really care about the time of
day and just want time stamps having the property that, if you have two
time stamps T1 and T2 then (T2 - T1) is the approximate elapsed real time
interval between the two times.  Unfortunately, settimeofday() causes
time of day timestamps to violate the latter property.

In any case, there seem to be two ways to fix the interval timer code.  The
first is to add a hook to settimeofday() to adjust the timestamp for each
process with an active timer (as was done for FreeBSD recently, and as is
equivalently done by NetBSD for runtime measurements and the NFS lease timeout
queue), the other is to provide a source of timestamps for interval timing
such that the difference between two timestamps is guaranteed to always be
the elapsed real time, and to make the interval timer code use this for
maintaining a timeout queue rather than the time of day.

The reason I dislike the FreeBSD fix of adding a hook in settimeofday() is
that interval timers aren't the only things that may eventually need hooks
there.  As mentioned about, both runtime and the NFS lease queue are already
adjusted in settimeofday().  And, in fact, I count not quite 40 kernel files
where the difference between two time-of-day timestamps is computed, so
there are potentially other places where breakage across settimeofday()
calls may be found, which also would need hooks in settimeofday() to repair,
this being annoying especially because most of these things don't care
about the time of day and just want the timestamps for the purpose of
measuring intervals.

So the other method of dealing with this is to maintain a time in the
kernel which explicitly guarantees that the difference between any two
timestamps will be the elapsed real time.  The kernel actually does maintain
such a time, called `mono_time', so there is precedent for this.  The
trouble with `mono_time's implementation is that:

(a) it is initialized to the time of day returned by the system's hardware
    time-of-day clock at boot time, and then counted up from there.  This
    means the value is essentially a random number, and is dependent on
    the hardware for reasonableness (IBM RT's had a battery-powered time of
    day clock, and would initialize the time of day to a value that was
    nearly all one's when the battery died.  If you didn't change the
    time it would roll over after a couple days.  On such a machine
    mono_time would not even be guaranteed to be monotonically increasing).

(b) mono_time doesn't get adjusted by adjtime() or by the NTP code, which
    means you don't get the benefit of the clock frequency disciplining which
    NTP provides when it's running.

(c) there is no way to get microsecond-accurate mono_time values.

(d) there is no way for user-space programs to read mono_time.

(e) there is no way to convert a mono_time timestamp to a time-of-day
    timestamp even though it would sometimes be useful to be able to do
    this when, for example, you want to print out a timestamp value.

What I've been thinking about instead is replacing mono_time with a
new struct timeval variable, perhaps named `uptime', which measures
the time since the system was booted.  That is, `uptime' would be maintained
such that (time == boottime + uptime) is always true.  Since the time
since the system was booted is monotonically and continuously increasing
at about real time, so would `uptime'.  For the implementation this
would also need the following:

(1) settimeofday() would alter `time' and `boottime', but would leave
    `uptime' alone.  The equality (time == boottime + uptime) would be
    maintained.

(2) adjtime(), and the fancy NTP adjustment code, would operate on both
    `uptime' and `time', so both are advanced at the same rate.

(3) microtime() would aquire a second argument which would be a pointer
    to either `time' or `uptime'.  This would allow microsecond-accurate
    up-time-stamps, as well as time-of-day-stamps, to be obtained.

(4) Interval timers, the NFS lease timeout code and runtime (and whatever
    else exists that does interval timing with `time') would be recoded to
    use `uptime'.  All hooks in settimeofday() would be removed.

(5) A system call returning microsecond-accurate `uptime's would be added,
    something like

	getsystimes(struct timeval *uptime, struct timeval *boottime)

    This would accomplish two things.  User programs which use timestamps
    for interval timing would be able to access an accurate `uptime' and use
    it the same way the kernel does, and would be able to easily access
    `boottime' either to print `uptime' timestamps as times of day or to
    detect step changes in the time of day and respond to them if they care.

Does any of this make sense to anyone?  I'd really like interval timers
fixed, and I'd rather not add the FreeBSD hook to adjust them in settimeofday().

Dennis Ferguson