current-users: Re: one program, one machine, two wildly different execution times

Subject: Re: one program, one machine, two wildly different execution times
To: David Maxwell <david@vex.net>
From: Jim Bernard <jbernard@mines.edu>
List: current-users
Date: 04/20/2003 12:11:04

On Sun, Apr 20, 2003 at 01:03:57AM -0400, David Maxwell wrote:
> On Sun, Apr 20, 2003 at 10:14:30AM +1000, Ben Elliston wrote:
> > Jim Bernard <jbernard@mines.edu> writes:
> > 
> > >   My guess is that the two different execution times represent runs
> > > in which the program was loaded with different alignments of
> > > double-precision variables, one efficient and the other not.  I
> > > tried compiling with -malign-double, but that made no difference.
> > > I've tried a large number of variations in compile-time optimization
> > > flags, including no optimization, and nothing made much of a
> > > difference.
> > 
> > I doubt that would make any difference; your variables should be bound
> > to the same address in every run.  Try compiling your program with -pg
> > and see if gprof gives you any insight into where the extra time is
> > being spent.  That might be a good starting point.
> 
> Ben's suggestion is a good one.

  Gprof shows a general slowdown, though the computation time is, in both
cases, almost all in two routines.

> Another thing you can try, which doesn't require recompiling, is to get
> a ktrace of the process for each runtime. I know you said it doesn't do
> much IO, but the timestamps on the system calls should let you narrow
> the extra delay down. At the least, you may be able to determine whether
> it's a single large hit, or many small ones.

  Ktrace's timestamps show all the time going into the calculation, which
is a single step in its output.  That is, there are entries for the various
startup items and for reading the input, then a big time jump to the next
entry, during which the calculation takes place, then entries for output
and termination.  In short, no new insight.

> Depending on the complexity of your program, you could also start
> inserting exit()s at various points, and recompiling. A binary search
> should narrow down where the extra time comes from, without too much
> trouble (as long as the fast/slow runs are fairly evenly distributed).

  It appears to be fairly evenly distributed.

  See my response to Ben's message for some additional, possibly relevant,
info.

  Thanks for the suggestion.  If you have more, I'll be glad to hear them.

--Jim