Subject: Re: PC164 memory accesses slow?
To: None <port-alpha@netbsd.org>
From: Joerg Czeranski <jc@joerch.org>
List: port-alpha
Date: 07/31/1999 21:17:22
I wrote:
> It's a rather silly benchmark, BTW, but if it helps optimize NetBSD
> performance, who cares? :)
> 
> I'll be offline for the rest of this week, but I'll get back to it
> next week and make source and binaries available.

After winning the 1.4 installation battle, I was finally able to build
a 1.4 binary and run some more tests.

The source is:
http://www.joerch.org/misc/maxflow.ml

I compiled it with Objective Caml 2.02 (see http://caml.inria.fr):
% ocamlopt -o maxflow maxflow.ml

These are the binaries compiled on NetBSD 1.3.3 with gcc 2.7.2.2 and
on NetBSD 1.4 with egcs 1.1.1:
http://www.joerch.org/misc/maxflow.1.3.3.gz
http://www.joerch.org/misc/maxflow.1.4.gz

Running it with "time ./maxflow" in a /tmp MFS I got these user times
on an otherwise idle box (system was ~ 0.2 to 0.3s, user+sys/real >= 99.8%):

NetBSD 1.3.3:
44.8, 46.2, 45.3, 45.2, 44.5, 45.9, 47.3, 44.9, 45.5, 46.1
50.5, 51.0, 50.3, 49.8, 49.7, 50.9, 49.5, 50.7, 50.4, 49.4
51.4, 51.5, 54.6, 54.2, 54.5, 51.1, 57.6, 51.3, 52.1, 50.2

NetBSD 1.4:
45.1, 45.1, 46.0, 46.5, 44.9, 45.2, 46.0, 45.6, 44.7, 45.1
49.8, 46.9, 50.8, 47.3, 45.8, 50.2, 46.6, 45.5, 48.9, 46.9
49.4, 52.6, 50.3, 46.3, 48.9, 46.9, 49.6, 46.5, 46.6, 54.7

Each batch of 10 runs was done without any intervening activity.
Between the batches I compiled kernels and did other things to keep the
box busy and shuffle memory around.

The system is a PC164 500MHz with 1MB 3rd level cache, and 128MB RAM
in 256 bit mode.

Booting Tru64 Unix 4.0E from CD and running a binary compiled on 4.0D
resulted in these user times (system ~ 0.25, user+sys/real = 99% except
for the first number):
48.40, 48.28, 48.21, 48.19, 48.16, 48.17

I didn't keep the original binary used for this test, but I recompiled
it now, it's at:
http://www.joerch.org/misc/maxflow.t64u.gz

AFAIK there's no randomness involved in the program, so all timing
effects should be caused by hardware or kernel anomalies.

I'll try the COMPAT_OSF1 patches for 1.4 later.

Profiling on T64U 4.0D indicates that > 80% of the time are spent
in the functions "scan_node" + "is_unscanned" (local to "scan_a_node") +
"scan".
These three functions plus a couple of garbage collection routines in
the O'Caml runtime add to 97% of the time spent.
"scan_node" alone uses 48%.  The "if" condition is almost always false,
so most of the work is done for accessing the array elements "labels.(j)".

happy benchmarking :-)
joerch