tech-kern: Re: NTP clock drift worsened around June 20?

Subject: Re: NTP clock drift worsened around June 20?
To: Erik E. Fair <fair@clock.org>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 08/06/1997 01:33:02
"Erik E. Fair" (Time Keeper) <fair@clock.org> writes:

 >Perhaps the scheduler? I wonder how much of the variation in NTP is the
 >vagaries of the existing scheduler; a re-examination of its assumptions in
 >the light of both modern systems architecture, and the various uses that
 >the system gets put to would be a very good thing to do. We should also
 >give a thought to adding some facility for making real-time scheduling
 >latency guarantees for things like the NTP daemon, CD-R writers, X window
 >system managers, and so on.

Yes, yes, and yes :).

I got into NetBSD because it let me collect kernel profiles with a
this-decade-ish TCP (as opposed to, say, the last-decade-ish TCP in Ultrix)

I've done more pmax microbenchmarking and tuning since finishing the
mips3 merge. _Wall_ time for a kernel build on a mips3 (r4000, r4400)
has improved by over 10%.  lmbench numbers (modified, one of the
programs doens't compile on mips with GCC) have improved by up to 50%.
The current NTP problem is an anomaly that has me puzzled.

The immediate problem with both poor NTP performance and PDMA overruns
(on some, but not all, 86k ports) smells like an interrupt-related
problem to me; we really need tools to find out _what_ is going wrong.

The latest release of SimOS purportedly runs on x86; has anyone looked
at using it with NetBSD???

A stupidly simple tool I use a lot is ntptime -c ; it can find changes
in syscall overhead due to cache conflicts (in a direct-mapped cache).
Christos kindly impored ntptime into our tree, but the current version
is broken. (it installs a signal handler inside the syscall loop,
duh!)  I guess I should fix that; meanwhile the 3.5F version is
better.


Better real-time scheduling might (or might not) fall out, along with
pre-emptibility, from the SMP work Charles is doing; I dunno.


 >The overarching issue is that we should probably start doing some
 >benchmarks to see if we can identify where the kernel time is going (and
 >see about other overheads), and preferably tune things a bit before the 1.3
 >release (whenever that is going to be). I just finished reading the paper
 >that Kevin Lai and Mary Baker did (and that you consulted on) on
 >FreeBSD/Linux/Solaris performance on a 100MHz Pentium;

Uh, note that collecting all that data took Kevin a _long_ time.  Some
of that was in getting FreeBSD, Solaris, and Linux to coexist on a
single disk, which wouldn't be an issue here; I don't know how the
rest broke down into data-gathering and analysis.

I think it's also a good idea to contrast that paper with Larry
Mcvoy's lmbench paper, and Andrew Brown's SIGMETRICS 97 paper.  (Mmm,
re the Pentium-tuned bcopy thread: I think Kevin deserves some credit
for noting the wins of tuning for the no-allocate-on-write-miss
Pentium cache, not just the FreeBSD team.)


 >offhand, I'd say
 >collecting that data for NetBSD 1.2 on each platform it supported as a
 >baseline would be a place to start. Then we go get it for each platform on
 >-current, and see what improved, and what got worse...

Yes, I think  we should.

My only disageeement is what the data-collection basepoint should be:
1.2, 1.2.1, or the first freeze for 1.3.  It's probably
port-dependent, too.  For example, replacing the kernel bcopy() in the
pmax port speeds up so many things that comparison against 1.2 or
1.2.1 isn't very useful.


But this whole idea ignores a basic point: NetBSD people -- even Core
members -- just dont' seem particularly interested in these kinds of
performance issues.  I've already suggested collecting lmbench numbers
and reviewing them periodically in an another (more private) forum;
that went nowhere.

And once in the past I've pointed out basic flaws in earlier
performance tuning: I went to the effort of instrumenting a kernel and
gathered and presented statistics sthat showed the microbenchmark
being used was grossly misrepresentative of real workloads.  

Which was totally ignored.


I'm not sure what the point is, in accumulating data showing
performance problems, if nobody's going to _do_ anything with them.