port-macppc: Re: ppc benchmarks, quick and dirty. 604ev, g3, mips

Subject: Re: ppc benchmarks, quick and dirty. 604ev, g3, mips
To: Michael <macallan18@earthlink.net>
From: Riccardo Mottola <rollei@tiscalinet.it>
List: port-macppc
Date: 03/12/2005 22:49:36
Hey,


> Well, I ran a few benchmarks too a while ago - the results were pretty different from yours. I had a 300MHz G3 in my S900 vs. a 300MHz 604e in a Motorola PowerStack II, both with 1MB L2 cache - at 150MHz for the G3, 66MHz for the 604e. The Mac runs NetBSD, the PS AIX, for both I used gcc 3.3.something. In all FPU-bound benchmarks the 604e had a 10%-20% edge over the G3, with integer-bound stuff the G3 was slightly faster. Memory-bound tests of course were vastly faster on the G3 as long as data fit into L2 cache, after that the 604e is faster again ( it runs its bus at 66MHz in the PS, vs. 50MHz in the Mac )
that is intersting.
Well in my case the g3 b/w has surely a faster memory and system bus..
but the cache should be 1:2 for both cpu's, shouldn't it? they have
exactly the same clock rate...

> And just for fun - when I use the xlc compiler on AIX the 604 is suddenly more than twice as fast as the G3 in FPU-bound tasks and has an edge almost everywhere else, but that's hardly fair ;-)
> So - how do we convince IBM to port xlc to NetBSD? The one I used was quite archaic, 5.0.2 or so.
yeah, I know. gcc... if you look at the bottom you see how mipspro edges
gcc.
However it was intersting to see that even specific g3 optimizations
didn't improve anything.

I have other programs where cpu specific optimizations do help, it just
states that my program is "simple" enough to compile, once scheduled
correctly.
 
> > 4096 samples, the data set is
> > 2+4 arrays of doubles of 4096 samples
> > that is 6 * sizeof(double) * 4096 = 6 * 6 * 4096 = 196608KBytes assuming
> > 8 bytes doubles.
> Umm, sizeof(double) is 8 if I remember correctly, so 48*4KB is certainly small enough for the L2.

yes double is 8, if you caclualte the total is correct, 6 is jsut a
typo.

 
> > I don't understand why the ppc604ev seems so slow. Everybody thought it
> > would be faster than the g3! and the fft calculus should be more cpu
> > bound than memory bound and the dataset should fit in the cache.
> The dataset is small, fits into the L2 cache so I'd expect that the G3 has an edge because it runs the cache at a much higher speed than the 604, even the 604ev runs the cache at only 100MHz, the G3 usually runs it at about half the CPU speed.

ah, I thought the 604ev run the cache 1:2 as the g3. this may explain
the difference. the FFT code operates many fpu instructions, but all
ehte data is kept in arrays, so cache speed is reallly essential. Maybe
another test is needed then.

 
> > Could
> > the os do something wrong (the 604 is not certified for 10.15 macos)
> > like not enabling or enabling "badly" the cpu L2 cache?
> Hmm, does -mcpu=604e -mtune=604e change anything? The cache should be enabled by the firmware, but who knows, it's OF 1.0.5 after all... how does the benchmark behave under NetBSD? There we'd at least know if the cache is active.

apple compiler doesn't have any special 604e/ev compiler options here.
Anyway the test showed that in this case specific optimizations didn't
gain much for neither the g3 or the 604ev.

Unfortunately I was never able to run netbsd on this box :( And my
latest attempts to compile a kernel on the 9500 failed too with scsi
problems as in the old times. Stupid.

-R