port-alpha: Re: "frequency error ... exceeeds tolerance"

Subject: Re: "frequency error ... exceeeds tolerance"
To: None <port-alpha@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-alpha
Date: 08/21/2007 12:23:47

> Maybe, but things are messy enough that I'd be wary of any
> conclusion.  NTP on the wire has various fixed-point formats,
> designed to be big enough for the need.  The kernel pll has the same
> mentality.  I wouldn't be all that surprised if something were
> wrapping.  500 really is wacky - normally even 100 is bad.  See
> /usr/include/sys/timex.h.

MAXFREQ from <sys/timex.h> is 512, and...

> I'd run /usr/sbin/ntptime and see what that says.

...ntptime says, among other things, "tolerance 512 ppm".
Interestingly, the messages in /var/log/messages show numbers from -512
to -501, and 501 to 512, but nothing else - I wonder what's going on
there.  (There's also a bug in sort -n, which I will send-pr; it sorts
in order -510, -512, -511, -500, -509, -508, -507, -506, ..., -502,
-501.)

I'm now collecting output from ntpdc -c loopinfo and ntptime both, once
a minute.  What are the most interesting figures?

> The other experiment I'd try would be to not run ntpd on the machine
> and run something (ntptrace will work, albeit kludgily) to measure
> the offset to another machine periodically.

ntpdate -q?

> I dimly recall some bug on some architecture, maybe even alpha, 10
> years ago or so, where the clock code was just off, in a 1023/1024
> kind of way.

I did once have a machine where NTP wouldn't even sync unless I
manually stuck an unusually large value in the drift file first.  (This
was a substantially older version of NTP.)

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B