Subject: Re: correctly counting user/sys/interrupt time
To: David Laight <david@l8s.co.uk>
From: Frank Kardel <Frank.Kardel@Acrys.COM>
List: tech-kern
Date: 04/10/2006 09:25:41
David,

while you are at it, have you (or anyone else) considered
instrumenting locks (lockmanager & maybe even simple) for collecting
their locking time (min, max, accumulated, maybe even
a historam histogram)? During debugging interrupts and timecounter
implementation I got the impression that there may be
some bigger locks in the code as clock interrupts (100Hz rate)
vary by more 10% quite a few times. This has been observed on
MP systems.

Example (checks are done in lapic_clockintr against TSC & i8254 counter 
(named tick here)):

Apr 10 09:08:42 pip /netbsd: cpu1: tick delta exceeds 10%: delta 8529, 
last 11932, tick 2458056983, last 2458036522
Apr 10 09:08:42 pip /netbsd: cpu1: TSC delta exceeds 10%: delta 
15744636, last 22028555, tsc 50121169
Apr 10 09:08:42 pip /netbsd: cpu1: tick delta exceeds 10%: delta 17057, 
last 20461, tick 2458060387, last 2458056983
Apr 10 09:08:42 pip /netbsd: cpu1: TSC delta exceeds 10%: delta 
31488769, last 37773191, tsc 87894360
Apr 10 09:08:42 pip /netbsd: cpu1: tick delta exceeds 10%: delta 8527, 
last 3404, tick 2458072318, last 2458060387
Apr 10 09:08:42 pip /netbsd: cpu1: TSC delta exceeds 10%: delta 
15741978, last 6284422, tsc 94178782
Apr 10 09:08:54 pip /netbsd: cpu1: tick delta exceeds 10%: delta 11387, 
last 11932, tick 2472389745, last 2472366426
Apr 10 09:08:54 pip /netbsd: cpu1: TSC delta exceeds 10%: delta 
21024446, last 22028028, tsc 736110818
Apr 10 09:08:54 pip /netbsd: cpu1: tick delta exceeds 10%: delta 22774, 
last 23319, tick 2472390290, last 2472389745
Apr 10 09:08:54 pip /netbsd: cpu1: TSC delta exceeds 10%: delta 
42041693, last 43052474, tsc 779163292
Apr 10 09:08:54 pip /netbsd: cpu1: tick delta exceeds 10%: delta 11385, 
last 545, tick 2472402220, last 2472390290
Apr 10 09:08:54 pip /netbsd: cpu1: TSC delta exceeds 10%: delta 
21008805, last 1010781, tsc 780174073
Apr 10 09:09:25 pip /netbsd: cpu0: tick delta exceeds 10%: delta 10058, 
last 11932, tick 2509401376, last 2509379386
Apr 10 09:09:25 pip /netbsd: cpu0: TSC delta exceeds 10%: delta 
18571607, last 22028292, tsc 349198359
Apr 10 09:09:25 pip /netbsd: cpu0: tick delta exceeds 10%: delta 20112, 
last 21990, tick 2509403254, last 2509401376
Apr 10 09:09:25 pip /netbsd: cpu0: TSC delta exceeds 10%: delta 
37133163, last 40599899, tsc 389798258
Apr 10 09:09:25 pip /netbsd: cpu0: tick delta exceeds 10%: delta 10048, 
last 1878, tick 2509415180, last 2509403254
Apr 10 09:09:25 pip /netbsd: cpu0: TSC delta exceeds 10%: delta 
18550971, last 3466736, tsc

David Laight wrote:

>On Sat, Apr 08, 2006 at 01:52:03PM -0700, Matt Thomas wrote:
>  
>
>>Exactly where and what are you counting?  Do you add members to lwp's l_md
>>struct?  I assume you've added stuff to cpu_info as well.
>>    
>>
>
>I've added 2 items to the struct lwp:
>	uint32_t l_syscall_time;
>	uint64_t *l_syscall_counter;
>
>Syscall entry does:
>	l->l_syscall_time = cpu_counter32();
>	l->l_syscall_counter = &syscall_times[code];
>	syscall_counts[code]++;
>
>Syscall exit and mi_sleep do:
>	*l->l_syscall_counter += cpu_counter32() - l->syscall_time;
>  
>
you are certainly taking wraps into account (l->syscall_time > 
cpu_counter32()), right?

>Return from m_sleep does:
>	l->l_syscall_time = cpu_counter32();
>
>(code added to syscall_plain/fancy in i386/i386/syscalls.c)
>
>Except that there are some other macros lurking.
>(The code I had working was a bit different to the above).
>syscall_counts[] and syscall_times[] can be retrieved through the
>sysctl interface and displayed by systat.
>
>Unfortunately, if a process is switched out during an interrupt the
>wrong places will get incremented, and an interrupt in the kernel
>will count as system call time.
>
>Now I can add the elapsed times to p->p_[usi]ticks.  But then really
>do need to know about non-system call entries into the kernel.
>
>The basic code is MI, it just needs the macro calls added to the
>relevant syscall code.
>
>I'm also not really sure what to do about emulations.  But that is
>just a SMOP.
>
>	David
>
>  
>
Frank

-- 
***********Confidentiality/Limited Liability Statement***************

This email and its contents do not constitute a commitment by Acrys Consult.
This message is confidential and may contain privileged information. The
information contained in this communication is intended solely for the use
of the individual or entity to whom it is addressed and others authorized to
receive it. It may contain confidential or legally privileged information. If
you are not the intended recipient you are hereby notified that any disclosure,
copying, distribution or taking any action in reliance on the contents of this
information is strictly prohibited and may be unlawful. If you have received
this communication in error, please notify us immediately by responding to
this email and then deleting it from your system. As reliability of the
Internet cannot be guaranteed, we are neither liable for proper nor complete
transmission of this or any other such communications.