port-mips: Re: mips kernel profiling

Subject: Re: mips kernel profiling
To: Ethan Solomita <ethan@geocast.com>
From: Simon Burge <simonb@netbsd.org>
List: port-mips
Date: 04/19/2000 13:58:52
Ethan Solomita wrote:

> 	I now have kernel profiling working on our MIPS platform, with the
> increase in stack size for mcount and Simon's noprof spl functions. (I
> haven't tried the inline assembler code -- what does that do to the size
> of the kernel?) However, I'm used to a different implementation, and I
> wanted to gather opinions.

I should add almost nothing to the kernel - MCOUNT_ENTRY and MCOUNT_EXIT
are only in __mcount.  I don't have any hard data on the performance
impact, but it saves a function call...

> 	The way it works today, the actual time spent within a function is
> accurate, since that's recorded from a clock interrupt. But the call
> graph information is generated per function call, so it isn't based upon
> time but on number-of-calls.
> 
> 	simplified eg: write() always takes twice as long as read(), let's say.
> If they're both called equally often by a given function, then they'll
> be listed (in the gprof output for that function) with equal weightings.
> But the writes actually took twice as long as the reads. If you're using
> the profiling to see what is actually happening in terms of code flow,
> then this is what you want. If you're looking for performance
> bottlenecks, this is useless.
> 
> 	What I'm used to is having the clock interrupt perform a backtrace in
> the interrupt routine, and fill in the call-graph information at clock
> interrupt time. In return, you do *not* need a profiled kernel. No
> mcount function. No performance hit on every function call. No separate
> kernels for profiling and regular use.
> 
> 	And, for the example given above, you'd see your function as spending
> twice as much time in write() as in read(). In fact, you'll actually see
> exact times you spent in your child functions, rather than the
> number-of-function-calls number.
> 
> 	I don't want to claim that this is superior to the current
> implementation, since they both have uses. I'm mostly seeking people's
> opinions about the desirability of this alternate implementation.

The first thing that springs to mind is what if a function is called
frequenctly, but is never active during the clock interrupt?  I'd
be curious to see comparisons of the call graphs between the two
implementations.

Also, isn't MIPS stack unwinding inherent unreliable?  Certainly seems
so in userland (fencepost errors and such from gdb) - it the kernel a
little more well behaved in this respect?

Simon.