Subject: Re: PII vs 21164
To: Christian von Kleist <cvk@zybx.com>
From: Francis. Javier Mesa <javi@cse.ucsc.edu>
List: port-alpha
Date: 05/16/2003 01:18:19
>
>      One thing that comes to mind are the cache differences between the
> 21164 and the PII.  The 21164 has 8kb of L1 cache and 96kb of L2
> cache, and the PII has 32kb of L1 and 256- to 2048kb of L2 (yours
> probably has 256- or 512kb).

The L2 in the alpha is actually on chip, the L2 for the PII is off chip
and runs at 1/2 the processor speed... so the smaller L2 in the Alpha is
not as bad as it seems since even though it is smaller than the 512K that
the PII uses for L2, it has much lower latency associated with it.

  Size isn't all that matters in caches,
> but assuming the cache hardware works about equally well in both
> processors the PII should have a distinct advantage with its larger
> L2 cache, even if it's only running at 66MHz (could also be 100MHz).

The L2 cache for the PII is actually on the same PCB board as the
processor and it runs at 1/2 the internal speed of the PII core.

> Fortunately the 21164 can also operate an L3 cache of considerable
> size on the motherboard.  When I put a 2mb beta cache in my PWS500a I
> saw a near doubling of compile performance!  If you don't have a beta
> cache module in your machine installing one might really help.

As a rule of thumb RISC machines need far larger caches than CISC
machines. In fact cache is fundamental to see the benefits of RISC. One of
the reasons is what you pointed out, RISC machines are far more memory
hungry than CISC when it comes to actual instruction requirements.

CISC machines were designed to deal with limited main memories. Since RAM
was rather pricey and slow in old times. So basically you wanted to do as
few memory accesses as possible, both to force programs to be as compact
as possible (ergo reducing RAM requirements) and to reduce the delay
associated with memory accesses (most CISC machines were not pipelined). Those
CISC instructions once fetched, are actually decoded into a microcode
sequence, basically you should think of CISC as a sort of "instruction"
compression. RISC basically gets rid of the decoding into microcode overhead,
instead what you fetch is the microcode itself. By using a tuned memory hierarchy
(caches) and pipelining RISC can make up for the increased instruction banwidth
requirements over CISC counterparts.

>      Finally, there's a big difference in code size between the 21164 and
> the PII.  The PII has a huge collection of CISC instructions that
> result in small assembly code size because a smaller number of
> assembly instructions are require per line of source code.  Small
> assembly code size means less memory traffic during compilation and
> assembly and less disk traffic when the resultant object code is
> written to disk.  In contrast the 21164 has a very small RISC
> instruction set that requires more machine code instructions per line
> of source code.  Also, the 21164 creates a 64-bit instruction from
> each of the assembly instructions.

You lost me here, the assembly instruction is what the alpha actually
uses... it does not generate an instruction out of those assembly
instructions. The instruction size for the Alpha is 32bit actually, the
native data path however (data size) is 64bit -common misconception-

  CISC instructions can be
> considerably longer than 64 bits, but the average length is probably
> pretty close to something like 64 bits.  That means the PII is moving
> much less information from disk to main memory to cache to processor
> during both the compile and assembly stages.

The problem with CISC is that although it is true that there are lower
instruction bandwidth requirements, the fact that instructions are not
regular (i.e. fixed in size) actually presents a significant overhead
(i.e. fetching of a very long instruction may require several memory
accesses and the IM unit needs to know how many memory acceses are
needed), where as for the RISC machines there is no overhead since every
instruction takes the same "amount" of fetching.

Also note that the PII is actually a risc core. The CISC instructions are
actually broken down into RISC subinstructions, which it almos behave
like microcode really... but with risc instructions as the microcode. So
that whole decoding into smoller RISC instructionlets also presents a
significan overhead.


>      The 21164 is definitely as fast as the PII in integer performance.
> (Of course it is way faster at FP performance, but aside from using
> the FP registers as a kind of cache FP is not an issue here.)  In the
> end I think the only explanation for the longer compile time on your
> 21164 is that more memory transfer is being done.  That includes all
> three levels: cache, main memory, and disk.

There are 2 issues. First off PII's integer performance is quite up to par
with old Alpha's. Intel concentrated mostly on Integer performance. And
compiling is an integer intensive process. Second GCC is far more
optimized for x86 than AXP, so there is also a significant difference in
compiling times right there.