Subject: user profiling improvements
To: tech-kern@netbsd.org <tech-kern@netbsd.org>
From: Ethan Solomita <ethan@geocast.com>
List: tech-kern
Date: 08/10/2000 19:27:12
	I'm not sure how important user profiling is in the NetBSD community --
I haven't gotten a lot of comments to past posts. But I've made some
improvements for internal use and I'm trying to decide whether they
should be rolled back into the main tree. Basically, I'm not sure if the
implementation will be controversial. I'm willing to take alternate
suggestions.

	Here's the problem I'm trying to overcome: a user profile today works
by having statclock() be called periodically from clock interrupt, and
if the frame's PC is in user land, then it increments the appropriate
bucket in the user's profile counts. In the event that the user process
made a system call, and the clock interrupt comes in during the system
call, or if the clock interrupt interrupts some other interrupt, the
frame that gets passed to statclock() contains the kernel's PC, and thus
no profiling information is computed.

	This has a major limitation, in that all system calls appear to take
zero time. For cpu-bound apps this isn't so bad, however it's nearly
useless for I/O-bound apps.	I want the kernel to do two extra things:

1. In statclock(), if the frame's PC points into kernel space, extract
the user's last PC before entering the kernel and increment the bucket
for that PC.

2. Count time spent sleeping, waiting to be woken up, against the user's
last PC before entering the kernel and increment for the full time spent
sleeping.

	The first item above solves the problem of "sys time", ie. time spent
actively running in the kernel. The second item solves the problems of
the process sleeping, eg. waiting for I/O to complete. If the app calls
read(), I want the user profile to report the read() routine in libc as
taking up all the time until the read completes.

	Neither of the above is all that hard. The second one is accomplished
by having a timestamp in the pstats structure, which is cleared by
mi_switch() before calling cpu_switch(), is set using microtime() by
setrunnable(), and is checked by mi_switch() after returning from
cpu_switch(), and if set, subtracting it's pre-cpu_switch() timestamp
from setrunnable()'s timestamp.

	The problem that makes it ugly is: how do you get the user's last PC?
Normally statclock() gets it from the frame, but in case 1 (above) the
frame's PC points to the PC within the kernel when the clock interrupt
hit, and in case 2 we have no frame at all.

	For the mips platform, I wrote a short routine cpu_getepc() which looks
at curpcb and, if non-null, returns the value of the PC in the user's
pcb. In pcb.h I added a "#define HAS_GETEPC 1" (as well as a function
prototype), so that the kernel code to implement 1 and 2 (above) is all
surrounded by an "#ifdef HAS_GETEPC". This works well enough, but I
wanted to give people a chance to whine and moan about it first. 8-)

	Also, although I believe that the new profiling behavior above is the
Right Thing (TM), I wanted to give people a chance to say that it should
be an option, and not default.

	I'll move this discussion to current-users, BTW, I just wanted to give
tech-kern first shot at attacking the implementation issues.

	Thanks!
	-- Ethan