Subject: Re: settimeofday() versus interval tim{ers,ing}
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Dennis Ferguson <dennis@jnx.com>
List: tech-kern
Date: 09/30/1996 22:51:52
Jonathan,

Sorry, I had a deadline today.  I'll try to catch up on this now.

> That is, I think there's a real architectural issue here.  Suppose a
> process wants to set an interal timer.  The process could want to
> sleep for a specified interval of real (oh, let's say UTC and ignore
> leap-seocnds for now) time.  Or the process coudl want to sleep until
> a specific *point* in time.
[...]
> Or, in othe words, a FreeBSD-style "fix" that does the right
> thing for you gated will do the *wrong* thing for my process sleeping
> until 5 minutes before a meeting; an vice-versa.

You seem to be under the impression that the timer set by setitimer()
will actually keep you on time for your meeting the way it is now, and
that I am breaking this by suggesting its behaviour be changed.  This
is not true.  Not only does nothing on setitimer(2) suggest that it
implements your behaviour, or even intended to implement your behaviour,
but it also doesn't do it in practice.

In particular suppose that at 9:55, according to your computer's clock,
you call setitimer() and tell it to expire in 30 minutes, with an interval
of 30 minutes, so you can go for coffee at 10:25 and get to your 11:00
meeting.  Suppose someone comes in at 10:26 and corrects your clock by
setting it forward by 27 minutes, to 10:53.  You are now late for your
meeting, since the interval timer won't expire until 11:22, but now that
the damage is done the next expiry will be at 11:25.

This behaviour is utterly counter-intuitive, and useful to no one.  Someone
who set the timer at 10:25 expecting to be awakened at 10:55 is shit out of
luck.  Someone who set the interval to 1/2 hour expecting to be awakened every
half hour is in a similar boat.  setitimer() doesn't do anyone's job right
the way it is now.

Given that setitimer() is broken all round, all I'm really suggesting we
do is to fix it to match the manual page.  The other behaviour might be
useful, but setitimer() doesn't do it now and doesn't appear from the
manual page to every have been intended to do it.  Supporting time-of-day
timers is an issue for another day, I'm only trying to get current
facilities to match their manual page.

> I think I mis-understand now what you wanted to do.  Fiddling
> boottime to maintain the invariant you want sounds, well, gross.

Actually I'm not suggesting doing anything in particular with boottime
that isn't done right now.  The whole idea of keeping `uptime' as a
separate variable stemmed from the observation that while d(time)/dt
is discontinuous across calls to settimeofday(), d(time - boottime)/dt
is not, *in the current kernel* (that is, you'll never see the output
of uptime(1) jump forward or back even now).  boottime is adjusted
in settimeofday() already, and the idea came from working backwards
from this fact.  I'm not suggesting changing the current invariant,
or inventing anything new, just making a value which implicitly
exists already explicitly available to things that can make use of it.

Again, I'm not suggesting anything new in the way of functionaliy, rather
I'm just trying to find a method of fixing things which are demonstrably
broken now.  And, ignoring current functionality which doesn't exist and
focussing on that which does exist now, this change actually has negative
cost.  In exchange for 5 lines of code in hardclock() to maintain `uptime',
and a line or two in each microtime() you can immediately save 2 lines of
code in settimeofday() and about 50 in nfs/nfs_nqlease.c.  You can also
fix select()/poll() (which I think can also have their timeout hung by a
backward clock step if you get them just right, though I've not been
able to do this yet) and code in many of the following files, most of
which also appear to be trying to use `time' to do interval timing

	arch/alpha/wscons/kbd.c
	arch/amiga/dev/if_bah.c
	arch/arm32/mainbus/kbd.c
	arch/arm32/mainbus/pms.c
	arch/arm32/mainbus/qmouse.c
	arch/atari/dev/zs.c
	arch/hp300/dev/dcm.c
	arch/i386/isa/pcvt/pcvt_sup.c
	arch/mac68k/dev/adb.c
	arch/mac68k/dev/adbsys.c
	arch/mac68k/dev/z8530tty.c
	arch/sparc/dev/zs.c
	arch/sparc/sparc/intr.c
	arch/vax/uba/uba.c
	arch/vax/vax/ka650.c
	arch/x68k/dev/pow.c
	arch/x68k/dev/zs.c
	dev/ic/z8530tty.c
	dev/isa/gus.c
	kern/kern_resource.c
	kern/kern_synch.c
	net/if_arcsubr.c
	net/if_ethersubr.c
	net/if_fddisubr.c
	net/if_ppp.c
	net/if_sl.c
	net/if_strip.c
	netccitt/pk_acct.c
	netccitt/pk_subr.c
	netinet/if_ether.c
	netinet/ip_mroute.c
	netiso/iso_snpac.c
	nfs/nfs_socket.c
	nfs/nfs_syscalls.c
	nfs/nfs_vfsops.c
	ufs/ffs/ffs_vfsops.c
	ufs/lfs/lfs_alloc.c
	ufs/ufs/ufs_quota.c

(this list was compiled by looking for things which took the difference
between two time-of-day's, so I may be wrong about some of it).

I do admit that I have avoided any consideraion of what to do for
processes which want to do something at a particular time of day.  I
do so because I don't know what to do for them other than let them
fend for themselves, this is an issue for someone smarter than me.
Despite this, however, I would strongly assert that the inability to
do anything for processes which care about the time of day should
not prevent one from fixing those that couldn't give a crap about
the time of day but still break across calls to settimeofday().  That
is, even if it isn't possible to write a program which cares about
the time of day which operates reliably across a call to settimeofday(),
it should still be possible to have programs which don't care about the
time of day be unaffected by this.  And it should be possible to
have the behaviour of system calls actually match their manual page.

Oh, one other thing:

> The concept of "elapsed real time" isn't very well defined
> when time-travelling. What does "elapsed real time" mean in this context?
> 
>	a) Real-time clock ticks, scaled by the clock period?  
>	b) Real-time clock ticks as disciplined by adjtime() ?
>
>	c) Real-time clock ticks as disciplined by NTP's PLL,  or FLL for
>	 intermittenntly-connected  sites?
>
>	d) "Real time" as measured by UTC, or as best the system can
>	  approximate?

I'm pretty sure that the best answer here is (b) or (c).  While there is
a much longer (and interesting!, though probably not relevant here),
answer to this, the short answer is related to the distinction between
`phase' and `frequency' which ntp, and timed though in a less clean
fashion, draw.  NTP distinguishes two types of errors, `phase' errors
and `frequency' errors.  `phase' errors are when your clock is just
wrong, and needs to be set right.  `frequency' errors are when your
clock is wrong, but may have been `right' in the past and got `wrong'
because it is running too fast or too slow.  NTP always fixes phase
errors by stepping your clock with settimeofday(), the fact that it
has no patience for phase errors is the reason the clock will jump
around alot on the end of a funky circuit.  Thus NTP (and timed also,
in fact, though timed is a lot more crufty) only calls adjtime(), or
the fancy kernel PLL (which I think is entirely inappropriate for
hardclock(), by the way, the damage should have been fixed by fixing
adjtime() and leaving the PLL in user space, but I digress), when it
thinks the clock frequency is off and it is trying to discipline it into
sync.  That is, despite appearances to the contrary, NTP never slews the
clock to get it phase-accurate, it really slews the clock to compute a
frequency correction and to keep the clock running more accurately once
this is done.

So if the point is to choose a time whose rate of advance matches
d(UT0)/dt most accurately, you'll get the best results by far in
the presence of xntpd with (b) or (c).  Of course the distinction
between (a), (b) and (c) could only matter to the truly anal retentive
but, given a choice, you might as well use the most accurate values
available and, over the long term, that'll always be (b)/(c) when
ntp is operating.

Dennis Ferguson