Subject: Re: kern/32035: APIC timer help
To: None <tech-kern@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: netbsd-bugs
Date: 12/07/2005 18:02:44
Simon Burge wrote:

> [ local APIC timer problem discussed ]

I've come to the conclusion that for some reason on the problematic
machines the APIC timer just doesn't fire with the same period for some
unknown reason, and that there's nothing we can really do about.  The
patch at

   ftp://ftp.netbsd.org/pub/NetBSD/misc/simonb/mp-time-hack.diff

at least lets time run stably.  The main comment at the top of the patch
describes what it does:

	* Some MP systems have been observed to not have a
	* stable local APIC timer interrupt.  We count the
	* number of TSC cycles since the last call to
	* lapic_clockintr(), and if it has been longer than
	* expected we add in some extract time for hardclock()
	* to add in when it computes the next value of the
	* system "time" variable.  Note that we don't skip
	* time backwards - early arrivals to lapic_clockintr()
	* have only been observed sporadically, and we'll
	* soon catch up.

Longer term, switching to timecounters is a more correct fix since they
base time calculations on the TSC counter and not the period of the
clock interrupt.  Using HPET timers where available will also help.

Until then though, any comments on the patch as is?  Is this too ugly
to consider to use in our source tree until then?  Is the name of the
option (LAPIC_TIMER_IS_BUGGERED) not quite appropriate? :-)

I'd be curious if anyone else with SMP boxes that have time keeping
problems could test this out and see if it fixes the time problem.

Simon.
--
Simon Burge                            <simonb@wasabisystems.com>
NetBSD Support and Service:         http://www.wasabisystems.com/