Subject: cc_microtime() bug makes gettimeofday() go backwards?
To: None <tech-kern@netbsd.org>
From: Nathan J. Williams <nathanw@wasabisystems.com>
List: tech-kern
Date: 05/06/2004 19:03:29
In some timekeeping investigation on my workstation, I found that I
suffer from the "gettimeofday() goes backwards" problem, as described
in PR kern/14058, with a simple test program there. After some poking
around in kern_microtime.c, the following code at the end of
cc_microtime() caught my eye:
/*
* Ordinarily, the current clock time is guaranteed to be later
* by at least one microsecond than the last time the clock was
* read. However, this rule applies only if the current time is
* within one second of the last time. Otherwise, the clock will
* (shudder) be set backward. The clock adjustment daemon or
* human equivalent is presumed to be correctly implemented and
* to set the clock backward only upon unavoidable crisis.
*/
simple_lock(µtime_slock);
sec = lasttime.tv_sec - t.tv_sec;
usec = lasttime.tv_usec - t.tv_usec;
if (usec < 0) {
usec += 1000000;
sec--;
}
if (sec == 0) {
t.tv_usec += usec + 1;
if (t.tv_usec >= 1000000) {
t.tv_usec -= 1000000;
t.tv_sec++;
}
}
lasttime = t;
simple_unlock(µtime_slock);
This appears to be trying to prevent t.tv_usec from going backwards,
short-term, by correcting it relative to the last time read
out. However, on modern fast machines, two calls to microtime can
occur within one microsecond. When this happens, the "sec == 0" test
fires, and t.tv_usec is improperly advanced beyond the current
time. On subsequent calls, t then appear to be in the past compared to
lasttime, and so t.tv_usec is advanced further and further by the + 1
term. This means that no more than 999999 calls to gettimeofday() will
return the same tv_sec value, even when the call rate is faster than
that (my Athlon XP 2000 can sustain about 1,600,000 gettimeofday()
calls per second).
The fix I have is to change:
if (sec == 0) {
to
if (sec == 0 && usec > 0) {
which avoids doing any "corrections" in fast, repeated calls. This
seems to have fixed the problem on my alpha box (UP1500, 800MHz
21264), which previously would have a negative gettimeofday() result
from the test program about every 20 seconds, and on my dell PC
(2.4GHz Xeon). The Athlon box now trips the error about every 7
seconds instead of every 2 seconds... which is an improvement of
sorts, but something else is still wrong.
Comments? Timekeeping hasn't really been my specialty.
- Nathan