Subject: Re: NetBSD/i386 processor recommendation
To: None <michaelv@MindBender.serv.net, ross@teraflop.com>
From: Ross Harvey <ross@teraflop.com>
List: port-i386
Date: 08/05/1997 22:44:43
> Actually, the FreeBSD folks have done a lot of tweaking on the P5
> bcopy code, and have been able to make it vastly out-perform the P6 on
> this single benchmark. It might have more to do with the Natoma
> vs. the Triton-II chipset. I don't know the details. However,
> highly optimized straight memory-to-memory copies can be made faster
> on a P5 than on a Pentium Pro. This is about the only thing a P5 can
> do faster, however.
Ah, you have referenced the RFO problem. Aaron brown and Margo Seltzer
did a paper at Harvard analyzing NetBSD performance. See:
www.eecs.harvard.edu/~vino/perf/hbench/sigmetrics/hbench.html
The P6 does unncessary read-for-ownership cycles even when not in SMP mode.
It ruins the main memory write performance. However, the reads and in-cache
writes are so much faster that this only affects things like long bcopys.
Overall performance is still very impressive compared to a P5. (I happen
to like the DEC Alpha as well, naturally, and its a lot better on floating
point.)
Anyway, to quote from the paper:
Returning to the data in Figure 3, we see that the most spectac-
ular feature is the performance of the Pentium Pro system. The
Pro- 200 exhibits a strange combination of impressive
across-the-board memory bandwidth, except for
uncharacteristically poor main memory write bandwidth.
This won't affect you when you are in cache. This paper says that this operation
is 18% slower than a particular P5, though, not "vastly" worse.
----------------------
Ross Harvey Avalon Computer Systems, Inc. ross@teraflop.com
Santa Barbara http://www.teraflop.com