Subject: Re: kern/32035: APIC timer help
To: None <tech-kern@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 11/28/2005 12:37:14
[ Cross-posting to tech-kern to try to get some help on working
  out WTF if going on here. ]

Hi folks,

Can anyone explain how the x86 local APIC timer is supposed to work?

For some reason, just keeping the box the shows the symptoms in the PR
busy with builds doesn't make time unhappy, but a busy named does.  This
box is a Dell (I think) dual Xeon with an Intel E7525 chipset.

I've written a couple of programs that sort of simulate the
system call sequences that the busy named generates (these are at
ftp://ftp.NetBSD.org/pub/NetBSD/misc/simonb/named-simul.tar.gz ).
Running "receiver" and "sender <host> 2000000 2000" on the same box lead
it to totally lose time the same way that named makes it loose time.

On that particular i386 MP box when idle, we get to the top of
lapic_clockintr() approx every 29926950 TSC ticks in most cases.  Here's
some data showing the TSC cycles between lapic_clockintr() calls
(measured only on the primary CPU):
 
	tsc from last clockintr
	                 tsc cycles over 29926950

	29926950                0
	29926950                0
	29926950                0
	29950837            23887
	29926973               23
	29926950                0
	29926950                0
	29926950                0
	29926950                0
	29926950                0

I would have expected that if we're late getting to lapic_clockintr()
because say interrupts were blocked at the time, then we'd get to
the next (or at least some subsequent) call to lapic_clockintr() in
the future early WRT the TSC tick count - if the local APIC timer
interrupts are really coming in at a fixed rate.  So above I'd expect to
see something like:

	29926950                0
	29950837            23887
	29903063           -23887
	29926950                0

and thus keeping an average of 29926950 TSC ticks per call to
lapic_clockintr().

Occasionly I do see little cycles like:

	tsc from last clockintr
	                 tsc cycles over 29926950

	29926950                0
	29927025               75
	29926875              -75
	29926935              -15
	29926965               15
	29926950                0

and

	29926957                7
	29926950                0
	29926943               -7
	29926950                0
	29926957                7
	29926943               -7
	29926950                0
	29927092              142
	29926808             -142
	29926950                0

where we miss a few TSC ticks but then make those up in the next call
to lapic_clockintr() or two, which is what I'd expect to see, but this
never happens where any significant number of TSC ticks are missed, only
small values like above.

However, while the system is busy running the programs mentioned above
we see sequences like the following:

	tsc from last clockintr
	                 tsc cycles over 29926950

	29926950                0
	30573397           646447 
	30620333           693383 
	30596385           669435 
	30739912           812962 
	30812250           885300
	30691905           764955
	30440408           513458
	30082155           155205
	30453150           526200
	30525195           598245
	30571875           644925
	30692910           765960
	30691215           764265
	30501015           574065
	30465150           538200
	30062040           135090
	30472365           545415
	30596445           669495
	30070425           143475
	29926995               45
	29926950                0

where we have a much larger than normal delay between lapic_clockintr()
calls.

Here's some further data.  First here's three sets of the total number
of TSC ticks for 10,000 lapic_clockintr() calls while the box is idle:

	299274903660
	299273899440
	299274126585

and three sets while it's busy running the above programs:

	302256563415
	302291485440
	302242349520

The busy TSC ticks are pretty close to 1% over the idle TSC ticks, and
the clock on the box has lost approximately 23 seconds in 40 minutes
which is also close to 1% time loss.


Looking at i386/i386/vector.S, the lapic_ltimer interrupt vector sends
an end-of-interrupt (EOI) to the APIC and then calls lapic_clockintr().
Is it somehow possible that if there's a delay in sending the EOI that
the APIC timer won't restart counting down automatically?

Any other theories to describe the behaviour we're seeing?

I also have a dual Xeon here based on the Intel E7505 chipset that
doesn't seem to be affected by this problem, so it would seem that
it may be chipset related.

Simon.
--
Simon Burge                            <simonb@wasabisystems.com>
NetBSD Support and Service:         http://www.wasabisystems.com/