Subject: Re: PC164 memory bus speeed (was: pciide performance on alpha)
To: None <>
From: Andrew Gillham <>
List: port-alpha
Date: 11/02/1999 14:39:49
Thor Lancelot Simon writes:
> Okay, so this is an *old* thread.  Sorry 'bout that.
> However: you should consider the high cost of incorrectly estimating the
> cache and main memory latencies in the compiler.  I suspect that the reason
> almost *all* code seems to run much faster with gcc -mcpu=21164a isn't just
> the use of BWX instructions but also the use of cache and memory latency
> numbers that are a lot closer to reality for the pc164.
> Since we know how fast the L1, L2, and L3 caches are -- L1 and L2 are the
> same for all 21164, and the speed of the L3 parts should be stamped on
> them -- and the memory's 60ns, it should be possible to feed gcc figures
> that are exactly right, and I'd be curious to see what this does for the
> various memory-sensitive benchmarks people have been disappointed by.
> FWIW the pc164 that's now achieved the highest STREAM
> benchmark result I'd ever seen at the time, several hundred megabytes per
> second in 256-bit mode.  So the memory bandwidth of the pc164 is probably
> okay, and like people have noticed it's gcc that sucks.

Does anyone have a list of PC164 optimizations that should be turned on?
I just got a 500Mhz PC164, and while being fairly impressed with it's
speed,(ok I also have a 166Mhz Multia) I was surprised that my 433Mhz
Celeron generally beat it.  A 'make build' of -current (same date) takes
1:25:00 on my Celeron 433, and 2:30 on my Alpha 500.  Both with fast
EIDE hard drives. (a 8GB 512KB buffer on the Celeron, 20GB 2MB buffer on
the Alpha)  Admittedly the Celeron has a heck of a lot more RAM (256MB
.vs. 64MB) and I haven't tested it with only 64MB. (or only 48MB)

The Celeron outperforms the Alpha at dhyrstones (~800K .vs. ~540K), and
at whetstones (250Mips .vs. 180Mips)

The only (so far) that the Alpha wins is the DES part of rc5des. (4.9M 
.vs. 6.0M)

I haven't had a chance to run lmbench on both systems, and to plug only
a single 64MB SDRAM into the Celeron and run make build.  Also, I don't
have any more SIMMs for the Alpha, so I can only run in 128bit memory
mode. (with only 64MB)

If there are some optimizations (like -mcpu=21164a) that should be enabled
I would love to hear about them.  What are the options needed to specify
the L3 cache latency?

I'm a little bit disappointed that my 500Mhz 64bit Alpha beast is getting
whipped by a 433Mhz chip with 128KB of L2 cache. :(

Tuning tips, performance numbers, or CISC .vs. RISC flames accepted. :)

Andrew Gillham                            | This space left blank                     | inadvertently.
I speak for myself, not for my employer.  | Contact the publisher.