Re: Flaky timekeeping?

To: port-next68k%netbsd.org@localhost
Subject: Re: Flaky timekeeping?
From: Mouse <mouse%Rodents-Montreal.ORG@localhost>
Date: Tue, 7 Jan 2014 03:11:50 -0500 (EST)

[top-posting and no-trimming damage repaired manually]
>> [...slab running...]
>> And the clock is drifting, and, worse, drifting irregularly.

>> I started ntpd as a broadcast client [...].  Here's what I get
>> querying it every 64 seconds from another machine:

>> [time drifting by about 0.5%, but irregularly: queried at 64-second
>> intervals, offsets are 0.509586, 1.322884, 1.640805, 1.955022,
>> 1.777081, 2.075839, 2.888184, 3.205274, 3.034869, 3.854038]

> The trick is that you need ntpd, the network time daemon, to
> regularly access a reliable network-based clock, and then the drift
> can be calculated and correctly.  Only once the ntpd is set up and
> had learned the drift do you have accurate time.

As I said, I was using ntpd.  The other host broadcasting chime was/is
running at stratum 4; my house LAN's time is ultimately referred to the
NRC's atomic clock (I'm using one of their public stratum-2 servers as
one of my house time references).

The problem is not that the clock drifts (that's an issue, but a
relatively minor one).  The problem is that the clock drifts
*irregularly*.  ntpd can't learn an irregular drift, at least not when
the irregularities are of the order seen above: depending on the
particular 64-second interval, the drift varies from -0.278033% to
1.27995%, with a mean of 0.580634%.  (ntpd also refuses to handle too
much drift; 0.58% is 5800ppm, well outside its tolerance range.)

> Another problem with my memory is that I'm not sure I know the
> "normal" way to run ntpd.

Well, unless there's something next68k-specific, I do - or, at least,
the only troubles I've had before with it (even under the same OS rev I
was using here) have been either (a) machines like mac68k where clock
interrupt hardware priority is low enough to practically guarantee bad
timekeeping and (b) machines where the drift, while steady, is well
outside the range ntpd is willing to tolerate.  I even had a host in
the NTP Pool project until they brought in their ridiculous (from my
POV) terms of service for pool members.  This irregular drift is
definitely not (b), leading me to speculate that it might be (a).

I'd need to fudge the base tick value to get the drift within ntpd's
tolerance range.  I could do that, but until the source of the
variation is fixed, there's little point.

> In any case, I do not believe that interrupt priority is the culprit,
> here.  Instead, I believe it's just that the crystal-based timer tick
> is not a convenient multiple of time, and thus the accumulated total
> drifts away from real time very quickly.

I now think it's not just that.  Since writing my list mail, I've tried
1.4T (I forget what pushed me to - compiler taking a very long time to
run maybe?).  Under 1.4T, the drift is much closer to constant.  xntpd
(1.4T uses xntpd, not ntpd) showed drift too, but, like ntpd on 4.0.1,
it was large enough that it wouldn't sync; I switched to running a
shell loop "while sleep 300; do ntpdate 10.0.1.1; done" in the
background, a much cruder variant of the same thing.  I have a much
larger sample size here; I have 124 step values, with a min of
1.117930, a max of 1.272906, and a mean of 1.17764, for a mean drift of
about 0.39%.  The job mix is roughly the same (a build of the world),
though, if the userland job mix affects timekeeping, something's pretty
broken anyway.

Given the differences here, there is clearly some substantial software
difference.  Based on the drastic difference in the drift's variance, I
am inclined to blame 4.0.1.  For my purposes, I'm just sticking with
1.4T for now - on this hardware the only benefit 4.0.1 offers over it
for me is SCSI support, which for my purposes is minor compared to
half-decent timekeeping and a usable compiler - and either just living
with the clock drift or, if I get annoyed enough, bashing the clock
tick figure to correct for an approximation to the mean drift so NTP
can take it the rest of the way.

> It may actually be normal for the drift to accumulate until some
> threshold is triggered, and then a correction brings things back to
> the reference time line.

It is not - at least not, again, unless there's something
next68k-specific involved.  None of my other 4.0.1 machines' ntpds work
that way, nor is that how I understand ntp to be designed.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Follow-Ups:
- Re: Flaky timekeeping?
  - From: Brian Willoughby

References:
- Flaky timekeeping?
  - From: Mouse
- Re: Flaky timekeeping?
  - From: Brian Willoughby

Prev by Date: Flaky timekeeping?
Next by Date: Re: Flaky timekeeping?
Previous by Thread: Re: Flaky timekeeping?
Next by Thread: Re: Flaky timekeeping?
Indexes:

Home | Main Index | Thread Index | Old Index