Subject: Re: NetBSD/i386 processor recommendation
To: Michael L. VanLoon -- HeadCandy.com <michaelv@MindBender.serv.net>
From: Aaron Brown <abrown@cs.berkeley.edu>
List: port-i386
Date: 08/07/1997 19:16:28
"Michael L. VanLoon -- HeadCandy.com" <michaelv@MindBender.serv.net> writes:

[snip P5 vs. P6 discussion]

> Actually, the FreeBSD folks have done a lot of tweaking on the P5
> bcopy code, and have been able to make it vastly out-perform the P6 on
> this single benchmark.  It might have more to do with the Natoma
> vs. the Triton-II chipset.  I don't know the details.  However,
> highly optimized straight memory-to-memory copies can be made faster
> on a P5 than on a Pentium Pro.  This is about the only thing a P5 can
> do faster, however.

If these are large copies (bigger than the L2 cache size), this is
probably due to the P6's horrible streaming DRAM write performance.
The chip insists on doing bus invalidate transacations (or similar)
even when running in a uniprocessor system; the extra transactions
(plus the bus turn-around overhead) prevent the CPU from bursting 
DRAM-page-sized writes consecutively to the memory system.
Although the Pentium can't do the fancy combining and data 
reordering, on streaming operations like large copies, its
write performance is much better than the Pro's (with equivalent
EDO-class DRAM), and so even a dumb bcopy can be made to run faster
on a it than on a Pro. 
(see http://www.eecs.harvard.edu/vino/perf/hbench and follow the
links to the Sigmetrics '97 paper for more information and comparison
of the processors).

If these numbers refer to small copies, then I can't believe this
is true. The L1/L2 caches on the Pro are so vastly superior in
performance to those on the Pentium that any cache-sized operations
will win on the P6. This is one of the reasons that the Pro performs
so well, and it still amazes me that Intel screwed over the L2 cache
in going from the Pro to Pentium-II...

--Aaron