Subject: Re: Some kernel profiling on a 5000/133 with -current
To: Ken Wellsch <kwellsch@tampabay.rr.com>
From: Chuck Silvers <chuq@chuq.com>
List: port-pmax
Date: 08/30/2001 19:04:03
hi ken,

On Thu, Aug 30, 2001 at 05:02:48PM -0400, Ken Wellsch wrote:
> Hi Chuck,
> 
> I generally pay no attention to profile data beyond the flat call graph
> at the top of the gprof output.  But I did look at the call graph to see
> who was using the function.  True enough, I was influenced by the sorting
> provided by gprof.
> 
> So is it safe to say the first table is reasonably accurate? e.g. that
> memcpy is by far the dominant function consuming kernel time in my test
> (ignoring the profile overhead)? Is the total call count accurate?  Is
> the assigned call ratio in the call graph accurate? e.g. in my table below
> is memcpy being called nearly 50% of all calls, by microtime?  Or
> are the called-by assignment counts also suspect?

memcpy() does tend to be the most cpu-consuming function in any workload
that does a lot of file reads and writes, yes.  since that's pretty much
all that bonnie does, having that at the top of the list is expected.

the counts for number of calls are accurate with a few caveats:
static functions don't show up in the list, so they confuse the output.
assembly code with labels in the middle of "functions" somewhat mangle
the output too.  if you think you're seeing the effects of this,
make the static function non-static or rename the assembly labels to
a number and use the (eg.) "3f" style of referring to them.


> I confess I was less concerned about consumed CPU time as I was with the
> large number of calls that I didn't think were necessary.  I guess if
> function call overhead on a MIPS system is low it is more of a waste of
> time to suggest changing anything.

an empty function call doesn't take *that* much time on any platform.

-Chuck


> For what it is worth, I did find, doing long "bonnie" runs with 500Mb
> file-size, that the performance of the assembler memcpy() and the
> portable C version were about the same.
> 
> It is hardly sufficient testing, but I did make one 500Mb bonnie
> run with my change below and it appeared to drop over a minute from
> the ~52 minute run time (under a profiling kernel).
> 
> I also optimized out the call to platform.clkread for a MIPS1 only
> 3MIN kernel because as the code exists, it results in an empty call
> on non MIPS3 3MIN boxes (the dominant 3MIN box out there).
> 
> diff -u /sys/arch/pmax/pmax/machdep.c /tmp/machdep.c
> --- /sys/arch/pmax/pmax/machdep.c       Sun Aug 26 09:05:06 2001
> +++ /tmp/machdep.c      Thu Aug 30 16:49:44 2001
> @@ -772,10 +772,14 @@
>         int s = splclock();
>         static struct timeval lasttime;
>  
> -       *tvp = time;
> -#if (DEC_3MIN + DEC_MAXINE + DEC_3MAXPLUS) > 0
> +       tvp->tv_sec  = time.tv_sec;
> +       tvp->tv_usec = time.tv_usec;
> +
> +#if defined(DEC_MAXINE) || defined(DEC_3MAXPLUS) || \
> +    ( defined(DEC_3MIN) && defined(MIPS3) )
>         tvp->tv_usec += (*platform.clkread)();
>  #endif
> +
>         if (tvp->tv_usec >= 1000000) {
>                 tvp->tv_usec -= 1000000;
>                 tvp->tv_sec++;
> 
> Thanks for your reply!
> 
> -- Ken