Subject: settimeofday() versus interval tim{ers,ing}
To: Dennis Ferguson <>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 09/30/1996 13:37:29
>So the other method of dealing with this is to maintain a time in the
>kernel which explicitly guarantees that the difference between any two
>timestamps will be the elapsed real time.

I'm dumb sometimes, I just don't get this...

The concept of "elapsed real time" isn't very well defined
when time-travelling. What does "elapsed real time" mean in this context?

	a) Real-time clock ticks, scaled by the clock period?  
	b) Real-time clock ticks as disciplined by adjtime() ?

	c) Real-time clock ticks as disciplined by NTP's PLL,  or FLL for
	 intermittenntly-connected  sites?

	d) "Real time" as measured by UTC, or as best the system can

I'd be happy to accept a), except for ugly implementation hacks with
'fat ticks' once per second.  I'd be happy to accept c), provided the
in-kerenl NTP loop has locked onto a time source.  I'd accept d), if
we could specify what an acceptable approximation was.  I don't like
b) for obvious reasons, and I don't like c) in periods when the clock
is slewing fast, e.g., at startup, because it's not really "real" time.

It's fair to note that I happily run stratum-2 NTP servers with ~3ms
delay (over 100Mbit ethernet and a T3 line) to stratum-1 servers, and
keep local client system clocks synched well enough to compare network
traces on adjacent machines, using local timestamps to collate

>(e) there is no way to convert a mono_time timestamp to a time-of-day
>    timestamp even though it would sometimes be useful to be able to do
>    this when, for example, you want to print out a timestamp value.

I don't see how to fix this at *all*.  Settimeofday can set the clock
*backwards* as well as forwards.  That implies
	* time-of-day timestaps are not necessarily unique; and
	* they are not totally ordered in "mono time" by their values

I don't see any good way to even attempt this, in the presence of
backwards-time-travelling clocks. I'm conteplating inventing an
"epoch number" for each  time the clock gets changed backwards, tagging
each  timestamp with its epoch number, incrementing the epoch each
time  the clock is stepped, and keeping a history of the clock step
applied at each epoch boundary, to do the conversion.

I don't imagine that's what you meant when you listed (e) a something
you wanted to fix.  Could you explain again how your proposed
implementaiton addresses this?  Is it by stepping "boottime"
synchronously [:-)] with settimeofday() steps, or something else?

The more I look at this, the more i'm convinced that your proposed
implemetation is thoroughly corrrect for the scenario where some
unfortunate DOS or L***x refugee reboots their machine, setting the
system time-of-day from a nonvolatile clock under the misapprehension
the nonvolatile clock is in GMT, when in fact the clock is running in
local time.  

I haven't thought through Dennis' proposal to be sure it's correct for
other time-change scenarios, or for all uses of ``real-time''
timestamps.  To pick one example, the practice of using
as-close-as-possible-to-GMT for *file* timestamps is too established
to change.  (Changing it locally would also break NFS timestamps)

How many of the 40+ files using timevals are using inode times?  WHich
of those would change to "mono_time"-style timestamps?  There are
*bound* to be user-visible consequences of using a different,
incommensurate timestamp space for inode times, and for other
in-kernel usage, if timestamps from the two spaces are *ever*
compared; and I don't see off-hand how that can be avoided.

Last, to repeat myuself, the lack of a fine-grain monotonic time
*is* a problem.   I agreed with that, though some people seem to've missed
it. I agree that a higher resolution timestamp is a Good Thing.
Again,  I think a nanosecond resolution mono_time replacement is
even better than Dennis's suggestion of a microsecond-resolution "elapsed real time".      And, yes, it should be reachable from userspace.
I think a new sysctl variable is fine for a first implementation.f

If the latency of walking the sysctl tree is demonstrated to be a problem,
then we can add a new syscall.

I beleive the best way to get this into NetBSD is to write a section 9
manpage and a draft implementation, and  submit it to Someone In Authority.
I'm not quite that, yet, but I'll happily sheperd in a timespec/timeval
monotime and hooks to get at it from user-space.

For the rest, I'd like to contine with the ``exciting cut and thrust
of scientific debate''... maybe I'm being more than usually dense, but
it seems like there're still architectural issues about what's the
"right" thing to do when real time, as measured by the system, is
discontinuous.  I do think that's a hard issue, and I have not, at
all, meant to flame or unfairly disparage anyone's approaches for
dealing with it; merely to keep in mind that there are other views of
the problem.