current-users: one program, one machine, two wildly different execution times

Subject: one program, one machine, two wildly different execution times
To: None <current-users@netbsd.org>
From: Jim Bernard <jbernard@mines.edu>
List: current-users
Date: 04/19/2003 14:49:52
  While testing some changes to a program, I noticed a dramatic change
in the execution time of a test case when I expected very little change.
Further investigation eventually led to the fact that, given the exact
same executable file, the exact same input, and the exact same machine,
the test case executes in either about 8.7 sec or about 13.2 sec.  Variations
about those values are typically less than 0.2 sec.  What's more, once I
have run the program in a given instance of the shell (bash) and obtained
one of those two times, all subsequent invocations in that shell complete
in the same amount of time (to within the variation).  But other instances
of the shell, that have executed the program with the other execution time,
always execute it with that other time.  This is rather disconcerting, given
that I typically want to do runs that take many hours or even days to execute,
and a clearly unnecessary execution-time penalty of more than 50% will be
very costly.

Some details:

  * The program is compiled with g77 and spends virtually all of its time
    doing double-precision floating-point arithmetic; there is very little
    I/O.  It is dynamically linked (libs: libg2c, libm, libc).  The results
    of the calculation are independent of the execution time---that is,
    there are no changes in iteration count or the path to convergence of
    the calculation from one instance to another, even for invocations with
    the disparate execution times.

  * The machine is an AMD Athlon (i386 port) running -current 1.6Q as of
    March 22.  I tried booting an old 1.6A installation from June 9, 2002,
    and the result was the same (the same two wildly different execution
    times could be obtained).

  * I've observed both execution times when running the program in
    different xterms, as well as when running it in different wscons
    virtual terminals.

  My guess is that the two different execution times represent runs in
which the program was loaded with different alignments of double-precision
variables, one efficient and the other not.  I tried compiling with
-malign-double, but that made no difference.  I've tried a large number
of variations in compile-time optimization flags, including no optimization,
and nothing made much of a difference.

  So, my questions are:

  * Is there some way that I can check the alignment during a given
    invocation of the program?

  * If alignment differences do explain the disparity, is there some way
    that I can guarantee favorable alignment?  Is there possibly a bug
    in the OS or the toolchain that causes the alignment to be inefficient
    some of the time?

  * Does anybody have any other suggestions or ideas for resolving this?

  Thanks!

--Jim