tech-kern: Re: settimeofday() versus interval tim{ers,ing}

Subject: Re: settimeofday() versus interval tim{ers,ing}
To: Perry E. Metzger <perry@piermont.com>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 09/30/1996 19:11:28
>BTW, applications that want to sleep a long time (say an hour) are
>almost always better off sleeping less than that and "correcting"
>periodically.

[other tutorial-level unix programming stuff]

I *am* aware of that: that's why I cited the Gabriel article, in fact.

But:
	* How often is "often enough"? 
	* Is there a generally-safe level?
	* Does that  of "often enough" *change* if settimeofday()
	  becomes more robust and people start using it more???
	* How much does it break existing applications, like cron?
	* iit means

and, more importantly,
	* Can we do better by having the kernel take more care?

The magnitudes involved are also very different.
Scheduling vagaries don't, usually, starve a process out for multiple
hours.  Dennis is proposing having something that behaves well under
gross misconfiguration errors -- tens of thousands of seconds, for
crying  out loud.

Erik is pointig out that there really  are systems with clocks and
network links so bad that they're going to  jump continually, which
is *another* scenario where timestamp comparison loses, but for a
different reason.


And, more importantly, punting to luser level and letting luser
apps recompute their sleep interval   *DOES NOT*  fix the problem with
 (using Erik Fair's terminology) comparisons of Real-Interval-time
timestamps with Correct-TOD timestamps taken *before* the
immediately-preceding settimefoday(), because the value in bootime is
correct for that settimeofday epoch *only*, and is wrong for the
settimeofday()-adjusted value boottime for any preceding call to
settimeofday().

Since at least Perry seems not to get the point, perhaps a picture is
required here. Or a tutorial example:

What happens if, in Dennis' original example, the system comes up,
with the clock in  PST instead of GMT.  Assume no NTP.
After 5 minutes the sysadmin notices it's several hours out of synch,
and resets the clock.  Five minutes later, the sysadmin realizes
s/he's just returned form vacation, their wristwatch is in the wrong
timezone (say, Hawaii) wearing a wristwatch in the wrong time zone,
realizes 5 minutes later, and jumps the clock again, this time to the
correct timezone.

We now have *three* Correct-TO epochs : boot to first reset,
which is what, 7 hours out?,  first reset to second reset, which is,
uh, 2? hours out; and after the second reset, when the clock jitters
and get stepped a smaller amount every 15 minutes because it's  got a
cheap crystal.   Assume the kernel has live  timestamps from each
epoch, that it needs to compare to Interval-Real-Time  time,
as has been proposed.

In Dennis' scheme, the the IntervalTime/Correct-TOD timestamp is
correct for the current epoch, since boottime was readjusted when the
clock jumped; is correct, within the small-jump margin of error, back
to the second reset (since by assumption, boottime hasn't changed
much since then); and is wrong, but in different directions and
different number of hours, for times in the first and second epochs.

Or, in case it's still not clear: there are timestamps and itimers
around, which in Dennis' scheme were computed from three wildly
different values of boottime. boottime is a scalar.  Only one of the
three groups will give even *sane* comparisons with "Real Time"
timestamps computed using Dennis' invariant.   What do you propose
we do?    *I'd* say make all timesstamps be relative, i.e., clock
ticks   since boot.  But that loses for comparisons with  timestamps
that are *required* to be Correct Time-oF-Day, e.g., filesystem
timestamps.


I have yet to see any message that addresses, or even acknowledges,
this point.    I'm sorry that the user-level case was a bad example to
illustrate what I think are real problems in Dennis's design. I may
be completely missing the point there; but it looks like a hole in 
what was proposed.   Maybe it's not important because it's a "rare
case", which for engineering reasons we don't need to fix even though
it' wrong.  Maybe it's not important because I'm not understanding
the proposal correctly and I'm raising stupid objections.

If I *am* wrong, or there are engineering reasons why the problems I'm
pointing out aren't valid, I'd appreciate someone pointing it out,
instead of taking implementation-level potshots at an (admittely
poorly chosen) example of a much deeper problem.



>Among other things, you are supposedly sleeping for a fixed
>number of seconds,  not wall time, according to the specs

I don't give a damn what the spec(which one?) or the manpage says.

I made a point to *explictly say I was suggsting extending the
interface *beyond* the spec, by (bad solution) adding a new syscall,
or (beter solution) adding a new kind of (sic) itimer, which is like
ITIMER_REAL but *does* track the system's current ``correct-time-of-day''
estimate, for processes that want that semantics.  

All this, of course, assuming we divorce the Interval Real-Time from
Current-Time-Of-Day.  I thought it was obvious that that wasn't so
simple as Dennis claimed.  Looks like I'm wrong about the obviousness.
No-one has (yet) addressed the simplicitly.

Oh, yes, I *do* sincerely hope someone pokes holes all through what I'm
saying.  Dennis' solution would be really neat if it worked.