Subject: Re: NetBSD/i386 processor recommendation
To: None <michaelv@MindBender.serv.net, ross@teraflop.com>
From: Ross Harvey <ross@teraflop.com>
List: port-i386
Date: 08/05/1997 22:44:43
 > Actually, the FreeBSD folks have done a lot of tweaking on the P5
 > bcopy code, and have been able to make it vastly out-perform the P6 on
 > this single benchmark.  It might have more to do with the Natoma
 > vs. the Triton-II chipset.  I don't know the details.  However,
 > highly optimized straight memory-to-memory copies can be made faster
 > on a P5 than on a Pentium Pro.  This is about the only thing a P5 can
 > do faster, however.

Ah, you have referenced the RFO problem. Aaron brown and Margo Seltzer
did a paper at Harvard analyzing NetBSD performance. See:

	www.eecs.harvard.edu/~vino/perf/hbench/sigmetrics/hbench.html

The P6 does unncessary read-for-ownership cycles even when not in SMP mode.
It ruins the main memory write performance. However, the reads and in-cache
writes are so much faster that this only affects things like long bcopys.
Overall performance is still very impressive compared to a P5. (I happen
to like the DEC Alpha as well, naturally, and its a lot better on floating
point.)

Anyway, to quote from the paper:

	Returning to the data in Figure 3, we see that the most spectac-
	ular feature is the performance of the Pentium Pro system. The
	Pro- 200 exhibits a strange combination of impressive
	across-the-board memory bandwidth, except for
	uncharacteristically poor main memory write bandwidth.

This won't affect you when you are in cache. This paper says that this operation
is 18% slower than a particular P5, though, not "vastly" worse.
----------------------
Ross Harvey	Avalon Computer Systems, Inc.		  ross@teraflop.com
		Santa Barbara	 		    http://www.teraflop.com