Subject: Re: kern/32035: APIC timer help
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Simon Burge <simonb@wasabisystems.com>
List: netbsd-bugs
Date: 11/28/2005 01:38:02
The following reply was made to PR kern/32035; it has been noted by GNATS.

From: Simon Burge <simonb@wasabisystems.com>
To: tech-kern@netbsd.org
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/32035: APIC timer help
Date: Mon, 28 Nov 2005 12:37:14 +1100

 [ Cross-posting to tech-kern to try to get some help on working
   out WTF if going on here. ]
 
 Hi folks,
 
 Can anyone explain how the x86 local APIC timer is supposed to work?
 
 For some reason, just keeping the box the shows the symptoms in the PR
 busy with builds doesn't make time unhappy, but a busy named does.  This
 box is a Dell (I think) dual Xeon with an Intel E7525 chipset.
 
 I've written a couple of programs that sort of simulate the
 system call sequences that the busy named generates (these are at
 ftp://ftp.NetBSD.org/pub/NetBSD/misc/simonb/named-simul.tar.gz ).
 Running "receiver" and "sender <host> 2000000 2000" on the same box lead
 it to totally lose time the same way that named makes it loose time.
 
 On that particular i386 MP box when idle, we get to the top of
 lapic_clockintr() approx every 29926950 TSC ticks in most cases.  Here's
 some data showing the TSC cycles between lapic_clockintr() calls
 (measured only on the primary CPU):
  
 	tsc from last clockintr
 	                 tsc cycles over 29926950
 
 	29926950                0
 	29926950                0
 	29926950                0
 	29950837            23887
 	29926973               23
 	29926950                0
 	29926950                0
 	29926950                0
 	29926950                0
 	29926950                0
 
 I would have expected that if we're late getting to lapic_clockintr()
 because say interrupts were blocked at the time, then we'd get to
 the next (or at least some subsequent) call to lapic_clockintr() in
 the future early WRT the TSC tick count - if the local APIC timer
 interrupts are really coming in at a fixed rate.  So above I'd expect to
 see something like:
 
 	29926950                0
 	29950837            23887
 	29903063           -23887
 	29926950                0
 
 and thus keeping an average of 29926950 TSC ticks per call to
 lapic_clockintr().
 
 Occasionly I do see little cycles like:
 
 	tsc from last clockintr
 	                 tsc cycles over 29926950
 
 	29926950                0
 	29927025               75
 	29926875              -75
 	29926935              -15
 	29926965               15
 	29926950                0
 
 and
 
 	29926957                7
 	29926950                0
 	29926943               -7
 	29926950                0
 	29926957                7
 	29926943               -7
 	29926950                0
 	29927092              142
 	29926808             -142
 	29926950                0
 
 where we miss a few TSC ticks but then make those up in the next call
 to lapic_clockintr() or two, which is what I'd expect to see, but this
 never happens where any significant number of TSC ticks are missed, only
 small values like above.
 
 However, while the system is busy running the programs mentioned above
 we see sequences like the following:
 
 	tsc from last clockintr
 	                 tsc cycles over 29926950
 
 	29926950                0
 	30573397           646447 
 	30620333           693383 
 	30596385           669435 
 	30739912           812962 
 	30812250           885300
 	30691905           764955
 	30440408           513458
 	30082155           155205
 	30453150           526200
 	30525195           598245
 	30571875           644925
 	30692910           765960
 	30691215           764265
 	30501015           574065
 	30465150           538200
 	30062040           135090
 	30472365           545415
 	30596445           669495
 	30070425           143475
 	29926995               45
 	29926950                0
 
 where we have a much larger than normal delay between lapic_clockintr()
 calls.
 
 Here's some further data.  First here's three sets of the total number
 of TSC ticks for 10,000 lapic_clockintr() calls while the box is idle:
 
 	299274903660
 	299273899440
 	299274126585
 
 and three sets while it's busy running the above programs:
 
 	302256563415
 	302291485440
 	302242349520
 
 The busy TSC ticks are pretty close to 1% over the idle TSC ticks, and
 the clock on the box has lost approximately 23 seconds in 40 minutes
 which is also close to 1% time loss.
 
 
 Looking at i386/i386/vector.S, the lapic_ltimer interrupt vector sends
 an end-of-interrupt (EOI) to the APIC and then calls lapic_clockintr().
 Is it somehow possible that if there's a delay in sending the EOI that
 the APIC timer won't restart counting down automatically?
 
 Any other theories to describe the behaviour we're seeing?
 
 I also have a dual Xeon here based on the Intel E7505 chipset that
 doesn't seem to be affected by this problem, so it would seem that
 it may be chipset related.
 
 Simon.
 --
 Simon Burge                            <simonb@wasabisystems.com>
 NetBSD Support and Service:         http://www.wasabisystems.com/