Subject: Re: correctly counting user/sys/interrupt time
To: David Laight <david@l8s.co.uk>
From: Frank Kardel <kardel@netbsd.org>
List: tech-kern
Date: 04/11/2006 20:59:48
David Laight wrote:

>On Tue, Apr 11, 2006 at 03:01:25PM +0200, Frank Kardel wrote:
>  
>
>>"no" for instrumenting locks or "no" for observing this on MP systems ?
>>I assume the "instrumenting locks" part.
>>    
>>
>
>actually both...
>  
>
ahh, do I have to assume that my system is diffent as no interrupt delays
have been observed on MP systems or have the delays not been observed
because nobody detected them up to now?

>  
>
>>>In the past I used a logic analiser (in timing mode) to trigger when
>>>a device IRQ line was asserted for more than a few ms, wired the
>>>'trigger out' to the system NMI line, and thus got a system panic dump
>>>when the IRQ was masked for too long.  Turned out to be code that
>>>scrolled the VGA (text) console screen....
>>>
>>>
>>>      
>>>
>>Am I very wrong when I assume that this is unlikely to work
>>for lapic interrupts?
>>    
>>
>
>This was for the interrupt from an external card...
>  
>
I see, makes things a tad simpler :-)

>  
>
>>I would expect the interrupt request line
>>from the lapic timer would only be accessible via major chip surgery
>>and some cooling issues on an AMD64 X2. Maybe there are other
>>accessible interrupt lines, but I am currently leaning towards "in line"
>>time stamping in order to get some grip what delays clock interrupts
>>over 10%.
>>    
>>
>
>If the delay is hardclock -> softclock, then you 'just' need to know
>the place that the hardclock interrupt interrupted when there is
>a long delay.
>
I just looked into locore.S (I'd rather look at 68k code  that had at 
least a notion
of an interrupt level...).
lapic_clockintr is the hardclock interrupt and the only thing being able to
delay that would be disabled interrupts or a higher interrupt level if I 
see that
right (set me straight I miss something here - I am not a x86 guru).

> You might need to determine stack offsets back to earlier
>return addresses for addresses withing certain functions (or force
>things to be inlined) a hand generated table might suffice.
>
>  
>
For that it would probably be sufficient to call the debugger when 
detecting a
delayed interrupt to get the stack trace. But that wouldn't give me the 
cause
for the delay (long interrupt disables or long time high spl level), right?
Actual I'd like to see the criminal and not the victim.
Detecting a delay of more several ms (between 1 and 91ms today) makes me 
feel
uneasy. Either my delta-t checker is wrong or we have something going on 
in this
area. This behavior may be related to what we experience in

- PR/32035: MP machines can't keep time on busy nameservers

(also marked critical for 4.0)


>This shouldn't be too hard to find.
>
>  
>
oookay, so somebody just needs to have the right idea here :-) && :-(

>	David
>
>  
>
Frank