Subject: re: SMP & ntpd interaction?
To: Jon Buller <jon@bullers.net>
From: matthew green <mrg@eterna.com.au>
List: port-sparc
Date: 01/17/2003 16:43:36
I noticed this recently, but don't remember it happening "a while
ago" (So it may be hardware going bad, or a break in -current, or
SMP borking ntpd...) Anyone else seeing this? My LX running 1.6
is fine, but this SS20 doesn't seem to like to keep time well.
(Could it be because I have an SM81 for cpu0 and an SM71 for cpu1?)
Jan 16 13:09:39 ra ntpd[192]: time reset 10.702596 s
Jan 16 13:30:11 ra ntpd[192]: time reset 9.862815 s
Jan 16 13:51:46 ra ntpd[192]: time reset 10.654504 s
Jan 16 14:12:22 ra ntpd[192]: time reset 9.185279 s
Jan 16 14:33:00 ra ntpd[192]: time reset 10.818983 s
Jan 16 14:53:37 ra ntpd[192]: time reset 7.726530 s
Jan 16 15:14:10 ra ntpd[192]: time reset 9.213057 s
Jan 16 15:35:58 ra ntpd[192]: time reset 11.109225 s
Jan 16 15:56:31 ra ntpd[192]: time reset 8.270889 s
Jan 16 16:17:12 ra ntpd[192]: time reset 11.428772 s
Jan 16 16:38:49 ra ntpd[192]: time reset 11.505207 s
Jan 16 16:59:33 ra ntpd[192]: time reset 11.574271 s
Jan 16 17:20:08 ra ntpd[192]: time reset 11.889465 s
Jan 16 17:40:41 ra ntpd[192]: time reset 8.068915 s
well i see broken rusage(3) stats a lot from, eg tcsh... usually
for longer running processes, or maybe those who fork as well.. i
wonder if this is related. normally time lossage like this is
associated with missing clock interrupts, but you're seeing huge
differences in not very much time - 10 seconds ever 20 minutes
(1200 seconds) is a lot. umm, that's like nearly one clock interrupt
(out of 100) a second that is lost. i do notice that all these are
forward time resets.
PS pk, that spl patch for lock debug works great. Haven't seen
a single "xcall... can't ping cpuX" since.
yup. me too. pk, is it the right answer?`