Subject: Re: Some kernel profiling on a 5000/133 with -current
To: Ken Wellsch <kwellsch@tampabay.rr.com>
From: Chuck Silvers <chuq@chuq.com>
List: port-pmax
Date: 08/30/2001 13:31:46
hi,

sure it's possible to make that change, but it probably won't help all
that much.  but go ahead and try it and see how the total cpu time taken
by memcpy() changes.  it will probably be pretty much the same
(within the margin of error for your test).

the reason for this is that the profiler can't tell how how much of the
time used by a given function is related to each of the callers, so it
just assigns cpu time to the callers proportional to the number of calls.

for example, if we have this program:


char buf[100000000];

void
c(int v)
{
	int i;
	char *cp;

	if (v) {
		return;
	}

	for (i = 0; i < sizeof(buf); i++) {
		buf[i] = 1;
	}
}

void
a()
{
	c(1);
}

void
b()
{
	c(0);
}

int
main()
{
	int i;

	for (i = 0; i < 1000000; i++) {
		a();
	}
	b();
}



we get this gprof output:

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.00    1.03                 main [1]
                0.01    1.02 1000000/1000000     a [2]
                0.00    0.00       1/1           b [4]
-----------------------------------------------
                0.01    1.02 1000000/1000000     main [1]
[2]    100.0    0.01    1.02 1000000         a [2]
                1.02    0.00 1000000/1000001     c [3]
-----------------------------------------------
                0.00    0.00       1/1000001     b [4]
                1.02    0.00 1000000/1000001     a [2]
[3]     99.0    1.02    0.00 1000001         c [3]
-----------------------------------------------
                0.00    0.00       1/1           main [1]   
[4]      0.0    0.00    0.00       1         b [4]
                0.00    0.00       1/1000001     c [3]


according to gprof, all the time is taken by the 1000000 calls to
c() made by a(), but looking at the code it's obvious that
all the time is really taken by the one call to c() made by b().

in your case, all the time used by memcpy() is really due to things like
copyin() and copyout(), even though there are a lot of calls from microtime().

-Chuck


On Thu, Aug 30, 2001 at 01:04:02PM -0400, Ken Wellsch wrote:
> On Fri, Aug 24, 2001 at 03:39:50PM -0400, Ken Wellsch wrote:
> >
> > index % time    self  children    called     name
> > -----------------------------------------------------------------
> > [10]    19.7  249.70    0.00 2023108         memcpy [10]
> > -----------------------------------------------
> > [16]     7.8    0.36   98.55  796813         microtime <cycle 8> [16]
> >                98.35    0.00  796813/2023108     memcpy [10]
> > -----------------------------------------------
> >	...
> 
> I kept looking at this wondering, where the heck does microtime()
> ever call memcpy()?
> 
> Well, I pondered this for a while and in looking at the code in
> pmax/machdep.c I notice two "structure copies."
> 
> Sure enough, when I look at the assembler produced by "gcc" I see:
> 
> 	 ...
>         move    $4,$16
>         move    $17,$2
>         la      $5,time
>         .set    noreorder
>         .set    nomacro
>         jal     memcpy
>         li      $6,8                    # 0x8
>         .set    macro
>         .set    reorder
> 	 ...
> 
> which matches up with the line "*tvp = time;" below:
> 
>   microtime(tvp)
>         struct timeval *tvp;
>   { 
>         int s = splclock();
>         static struct timeval lasttime; 
>  
>         *tvp = time;
> 
> 	...
> 
> Apparently "time" is treated as volatile, while the closing
> structure copy "lasttime = *tvp;" does not involve memcpy().
> 
> So given that this code is executing with splclock(), is it
> possible to replace
> 
>         *tvp = time;
> 
> by
> 
>         tvp->tv_sec  = time.tv_sec;
>         tvp->tv_usec = time.tv_usec;
> 
> and thus avoid this expensive and ever so frequent call?
> 
> Cheers,
> 
> -- Ken