Subject: re: SMP & ntpd interaction?
To: Jon Buller <jon@bullers.net>
From: matthew green <mrg@eterna.com.au>
List: port-sparc
Date: 01/17/2003 16:43:36
   I noticed this recently, but don't remember it happening "a while
   ago" (So it may be hardware going bad, or a break in -current, or
   SMP borking ntpd...)  Anyone else seeing this?  My LX running 1.6
   is fine, but this SS20 doesn't seem to like to keep time well.
   
   (Could it be because I have an SM81 for cpu0 and an SM71 for cpu1?)
   
   Jan 16 13:09:39 ra ntpd[192]: time reset 10.702596 s
   Jan 16 13:30:11 ra ntpd[192]: time reset 9.862815 s
   Jan 16 13:51:46 ra ntpd[192]: time reset 10.654504 s
   Jan 16 14:12:22 ra ntpd[192]: time reset 9.185279 s
   Jan 16 14:33:00 ra ntpd[192]: time reset 10.818983 s
   Jan 16 14:53:37 ra ntpd[192]: time reset 7.726530 s
   Jan 16 15:14:10 ra ntpd[192]: time reset 9.213057 s
   Jan 16 15:35:58 ra ntpd[192]: time reset 11.109225 s
   Jan 16 15:56:31 ra ntpd[192]: time reset 8.270889 s
   Jan 16 16:17:12 ra ntpd[192]: time reset 11.428772 s
   Jan 16 16:38:49 ra ntpd[192]: time reset 11.505207 s
   Jan 16 16:59:33 ra ntpd[192]: time reset 11.574271 s
   Jan 16 17:20:08 ra ntpd[192]: time reset 11.889465 s
   Jan 16 17:40:41 ra ntpd[192]: time reset 8.068915 s

well i see broken rusage(3) stats a lot from, eg tcsh...  usually
for longer running processes, or maybe those who fork as well.. i
wonder if this is related.  normally time lossage like this is
associated with missing clock interrupts, but you're seeing huge
differences in not very much time - 10 seconds ever 20 minutes
(1200 seconds) is a lot.  umm, that's like nearly one clock interrupt
(out of 100) a second that is lost.  i do notice that all these are
forward time resets.
   
   PS  pk, that spl patch for lock debug works great.  Haven't seen
   a single "xcall... can't ping cpuX" since.

yup.  me too.  pk, is it the right answer?`