Re: getrusage() problems with user vs. system time reporting

To: NetBSD Kernel Technical Discussion List <tech-kern%NetBSD.org@localhost>
Subject: Re: getrusage() problems with user vs. system time reporting
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Fri, 28 Oct 2011 15:54:57 -0700

On Thu, Oct 27, 2011 at 07:24:03 +-300, Jukka Ruohonen wrote:
> 
> This is a well-known bug that is over 15 year old. The much simpler tests in
> atf(7) replicate it well. The used tracker PR is kern/30115. Michael van
> Elst suggested therein couple of reasonable (IMO) solutions.

Part of the point of this new discussion is that I am attempting,
perhaps poorly, to show that I think PR#30115 and its historical
counterpart, and similar reports in the PR databases of other *BSDs,
represent a separate, unique, problem.

It is possible that the problem I'm trying to show here shares, or is at
least related to, the same cause as the problem shown in PR#30115.
That's part of what I'm trying to discover here.

However FreeBSD's solution to PR#30115 is not in any way a valid
solution to the problem I'm trying to show here, regardless of whether
the problem I'm trying to show has the same cause or not.  That solution
will prevent the little wobbles that the simplistic tests demonstrate,
but it won't make overall getrusage() timing results any more meaningful
and consistent.  Indeed it may even make them a wee bit more wrong,
though I'm not sure this last part matters so much.

From what I understand currently, especially if the root cause of these
problems is related, then David's proposed solution would be on the
right track:

On Fri, 28 Oct 2011 08:48:19 +0100, David Laight wrote:
> 
> If you are willing to take the cost of getting the timestamp (in
> some units) on every kernel entry/exit (as well as the process switch)
> then the time in usr/sys can be added to the clock tick counts and
> used when the actual execution time is split.
> (Doing it that way means the units don't have to be THAT accurate)

Hmmm.... if we could save the current time on every kernel entry, and
then increment a new "l_systime" variable with the elapsed time on every
return to user mode, and of course use the same clock as is used for
l_rtime (i.e. binuptime()), then the only wild-card variable left is
interrupt time.

Just how expensive is updatertime() and the associated bookkeeping it
needs?   Hmmm....

So, then user time would be the difference between the sum of thread
runtimes and the sum of thread systimes, less some value for interrupt
time.

Ideally interrupt time would also be measured similarly (using the same
clock again) by the interrupt dispatcher and accumulated against
whatever thread (kernel or user) was interrupted (e.g. in l_intrtime).
However I don't quite see how this could be possible to do safely,
especially in conjunction with SMP, though I'm not familiar enough with
the details of the locking that might be required to know for sure.  If
I'm wrong and it is possible to do then directly measuring and
accounting for interrupt time would also be a very good thing, (assuming
it wouldn't be so costly as to radically change overall system
performance).

In any case with the current state of affairs I'm beginning to think the
interrupt ticks are the real wild-cards here and I'm wanting to modify
getrusage() to return a new ru_itime value as well (or add a new system
call to return the raw p_rtime and p_*ticks values along with stathz).
After all, how likely is it that the average of time accounted to
p_iticks will actually match the true time used by interrupts.  I'm
guessing average interrupt service times are far less than stathz
intervals.

I'm also wondering if I can force "stathz=0" at runtime, perhaps with a
sysctl, so that I can also avoid the perturbations caused by having a
different (and possibly changing) statclock rate.  It's all well and
good to try to reduce the cost of statclock handling by giving it a much
lower rate than hardclock, but in the end that just makes the division
of p_rtime as returned by getrusage() effectively meaningless, and thus
some of the work done by statclock may as well be simply not done at all
in the first place when stathz is non-zero.  It would be much less
misleading, to say the least.

-- 
                                                Greg A. Woods
                                                Planix, Inc.

<woods%planix.com@localhost>       +1 250 762-7675        http://www.planix.com/

Attachment: pgpoJJbbsSQJJ.pgp
Description: PGP signature

Follow-Ups:
- Re: getrusage() problems with user vs. system time reporting
  - From: David Laight

References:
- getrusage() problems with user vs. system time reporting
  - From: Greg A. Woods
- Re: getrusage() problems with user vs. system time reporting
  - From: Jukka Ruohonen
- Re: getrusage() problems with user vs. system time reporting
  - From: David Laight

Prev by Date: Hello
Next by Date: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
Previous by Thread: Re: getrusage() problems with user vs. system time reporting
Next by Thread: Re: getrusage() problems with user vs. system time reporting
Indexes:

Home | Main Index | Thread Index | Old Index