Subject: Re: NetBSD/i386 processor recommendation
To: Martin Husemann <martin@rumolt.teuto.de>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: port-i386
Date: 08/07/1997 03:46:16
 >> The P6 does unncessary read-for-ownership cycles even when not in SMP mode.
 >> It ruins the main memory write performance. However, the reads and in-cache
 >> writes are so much faster that this only affects things like long bcopys.
 >
 >>Shouldn't bcopy and friends use a read after every cache-line full of data
 >>written (considering alignement and all that fun)? IIRC that would cause a
 >>burst-read of the cache-line, a fill in the cache in exclusive mode and then
 >>a burst-write of the whole line back to main memory. Should be faster since
 >>P5...
Uh... I colllaborated on Kevin Lai's USENIX paper which points out the
advantages of a read-cache-line-before-write strategy on Pentium chips
with a no--alllocate-on-write-miss cache policy.
I hacked up a bcopy() based on Kevin's and then ported the FreeBSD
bcopy(), which does maringally (or even, if you prefer) better by
using the FPU registers to issue 64-bit copies. 
I've sent copies of that to Frank van der Linden (Charles Hannum, the
principal i386 portmasteer, discards all e-mail from me, so I haven't
sent him a copy).
Caveat emptor.