Subject: Help needed - time-keeping on MP machines
To: NetBSD port-sparc mailing list <port-sparc@netbsd.org>
From: Julian Coleman <jdc@coris.org.uk>
List: port-sparc
Date: 01/28/2007 18:17:05
I've upgraded a couple of MP SPARCstation 20's to 4.0BETA2 and I've noticed
that ntpd has difficulties with clock synchronisation.  For example:

      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
 +orthanc         83.138.191.59    3 u  231 1024  377    7.812  -18.411   7.812
 +janus           212.13.198.71    3 u  779 1024  377    7.812  -18.893   7.812
 *nakor.amazing-i 130.159.196.118  3 u  269 1024  377   61.213  -18.451   7.812

The jitter is high (the first 2 machines are on a local network) and the
offset varies from (roughly +100 to -100).  The busier the machine, the
worse the problem.  As a comparison, values on a 4/330 are:

      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
 +orthanc         81.174.128.183   3 u  109  128  377    2.950    0.802   0.213
 +janus           212.13.198.71    3 u  119  128  377    3.362    0.936   0.160
 *rosehip.exnet.c 139.143.5.58     3 u  121  128  377   61.104   -3.777   2.415

I've had a discussion with Frank Kardel about this and I believe that the
problem is (any errors in the following are mine):

  we don't have a free running counter that's not tied to the clock interrupt
  or not specific to a CPU

From reading the comments at the top of sparc/sparc/clock.c, it seems that
the sun4m counters are per-CPU.  So, using timer-counter as the counter
source means that we are not always querying the same CPU's counter, which
results in ntpd seeing jitter.

If we use the clock interrupt counter, we also run into problems because
we are deriving clock interrupts from it, so it doesn't count as a "free-
running" counter source.

For reference, the FreeBSD code for sparc64 uses the clock interrupt counter
in free-running mode and derives the clock interrupt via a different source.

So, the bit where help is required is:

  how do we find a common, free-running counter that we can use for
  synchronisation?

Thanks,

J

-- 
  My other computer also runs NetBSD    /        Sailing at Newbiggin
        http://www.netbsd.org/        /   http://www.newbigginsailingclub.org/