Subject: Re: "frequency error ... exceeeds tolerance"
To: None <port-alpha@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-alpha
Date: 08/21/2007 11:27:17
>> Aug 21 01:46:01 Omega ntpd[348]: frequency error -508 PPM exceeds tolerance 500 PPM

[tnn@]
> What verson of NetBSD?

Doh!  My apologies.  This is 3.1.

> The fix on netbsd-3 is to define the CLKF_BASEPRI macro to 0.

This appears to be in sys/alpha/include/cpu.h, and, looking at
kern/kern_clock.c, it doesn't look as though it makes sense to casually
define it to zero.  How does that fix this problem?

[Izumi Tsutsui]
> Hmm, this may indicate that spllowersoftclock(9) has some problem.
> (all interrupts are blocked during softclock()?)
> 
> Other possible fix is ["options HZ=1024"] in kernel config file,
> which could adjust tick and tickadj variables properly.

I may try that....

[Greg Troxel]
> The real question is whether the clock is consistently that slow
> (actually fast - I think that's the correction rate), or badly
> behaved.

That's a good question.  The number is always consistently in the range
500-512, except its sign flips back and forth (which I hadn't noticed
until just now).  To me, this indicates that the clock is very badly
behaved, and just sometimes happens to misbehave badly enough to pass
the NTP limit one way or the other.  Does that sound like a correct
interpretation?

> I would suggest upping the limit and letting it stabilize.

What does "upping the limit" mean here?  Rebuilding NTP with
NTP_MAXFREQ set higher?  Rebuilding the kernel with a higher adjustment
slew rate?

> Then do 'ntpdc -c loopinfo' once a minute and graph the frequency
> results.  Also, what do the jitter and offset columns say?

I'm now doing loopinfo once a minute (according to the machine's own
clock) and will get back to the list once I have sokme data from that.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B