Subject: Variability in kernel performance due to cache conflicts?
To: None <tech-kern@netbsd.org>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 03/15/1999 13:01:26
I've been carefully tuning microtime() to work well on pmaxes.  It was
to 8-9 microseconds between userland calls, as reported by `ntptime
-c'.  (For comparison, a 133MHz Pentium and a 200MHz PPro are around ~6 usecs)

I reconfigure a new kernel with an added driver, reboot, and the 50MHz
r4000 ntp-gettime syscall latency has nearly doubled to 15-16 usecs.
As best I can tell, the _only_ siginficant difference between the 8us
and the 16us time is cache conflicts due to differing memory layout of
the kernel: changing memmove/memcpy and adding in audio drivers.

This may not sound like a lot, but to NTP time vultures, losing
a factor of 2 (one bit in advertised NTP precision) is just
Not Good Enough.

This is not new. I've documented similar but smaller effects going
back 2 or 3 years, due to adding or removing drivers; but moving some
crucial fragments (like the memcpy() code?) into libkern at the end of
the kernel (and thus more subject to movement with addition/removal of
kernel features) has increased the effect.


Has anyone else looked at micro-benchmarks like this and seen similar
effects due to cache-colouring and conflicts?  Any ideas on how to
address it inside our existing kernel build structure (short of, e.g.,
going back to hand-placing critical libkern functions inside locore)?