Subject: Re: "frequency error ... exceeeds tolerance"
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Greg Troxel <email@example.com>
Date: 08/21/2007 12:11:42
> The real question is whether the clock is consistently that slow
> (actually fast - I think that's the correction rate), or badly
That's a good question. The number is always consistently in the range
500-512, except its sign flips back and forth (which I hadn't noticed
until just now). To me, this indicates that the clock is very badly
behaved, and just sometimes happens to misbehave badly enough to pass
the NTP limit one way or the other. Does that sound like a correct
Maybe, but things are messy enough that I'd be wary of any conclusion.
NTP on the wire has various fixed-point formats, designed to be big
enough for the need. The kernel pll has the same mentality. I wouldn't
be all that surprised if something were wrapping. 500 really is wacky -
normally even 100 is bad. See /usr/include/sys/timex.h.
I'd run /usr/sbin/ntptime and see what that says.
> I would suggest upping the limit and letting it stabilize.
What does "upping the limit" mean here? Rebuilding NTP with
NTP_MAXFREQ set higher? Rebuilding the kernel with a higher adjustment
I meant to change the threshold for the 'too far out of bounds' if test,
to let the algorithm run. But I'm not so sure that's a good idea
because as you mentioned it runs pretty deep in the kernel.
The other experiment I'd try would be to not run ntpd on the machine and
run something (ntptrace will work, albeit kludgily) to measure the
offset to another machine periodically.
I dimly recall some bug on some architecture, maybe even alpha, 10 years
ago or so, where the clock code was just off, in a 1023/1024 kind of