Subject: Re: Cyberstorm (and Amiga cache)
To: None <amiga@NetBSD.ORG>
From: Jeffrey William Davis <c23jwd@eng.delcoelect.com>
List: amiga
Date: 03/28/1996 16:19:36
> c23jwd@eng.delcoelect.com (Jeffrey William Davis) writes:
> 
> >at a finite speed!  Clocked at 40MHz, the DRAM subsystem is able to process
> >all accesses immediately without any kind of delay (zero wait state).
> 
> At 40MHz the CPU can request a longword in 25ns, this requires at least
> 20ns DRAM. Do you still believe that the Warpengine has zero wait states ?
> The fastest reasonable burst cycle is probably a 2-1-1-1 burst. At 40MHz
> this yields a possible memory bandwidth of 128MByte/s but the WE memory
> system just delivers about 50MByte/s.

First of all, your information is completely misleading.  The 68040 requires
three clocks for normal bus accesses.  At 40MHz, 1 clock = 25ns, which
means a 'normal' bus cycle could take no less time than 75ns!  Hence, the
60ns DRAM keeps up at zero wait states fine.

Can you find in the 68040 databook where it can do a 2-1-1-1 burst?  I
believe 3-2-2-2 is the limit.  This yeilds a theoretical maximum bandwidth
of 67.81MBytes/sec.  In practice, the Warp Engine is achieving 57+ MBytes/sec,
when talking to the local RAM.

I will review my databooks and find the true capability of the MC040.
Assuming a 2-1-1-1 burst, the maximum theoretical performance increase
is 80%.  A L2 cache is not going to achieve near this, and it is not
going to be sustainable when moving data larger than the cache.  Besides,
there are MUCH better ways to handle this than a L2 cache and they could
achieve full performance.  How do I know?  I designed one for a 2-1-1-1
burst (not MC040).

> >The reason L2 cache is so prevalent in the PC realm (and others) is due
> >to the clock doubling, high CPU clock rates, and generally exceeding the
> >DRAM technology of today.  For example, a 166MHz bus speed on a system
> >with 3 clocks/access would require 18ns minimum speed RAM to run at full
> >speed.
> 
> No CPU in a PC uses a 166MHz bus. The fastest Pentiums use 66MHz
> on the bus and often have waitstates on L2 cache accesses.

Thank you Mr. Obvious!  My example was quite specific, albeit hypothetical.
With 60ns DRAM, the access frequency works out to be 16.67MHz.  Data can
be pulled slightly faster than this by some DRAM controller tricks.  But
still, assuming we can get 20 or 25MHz, at 2 clocks per access we still
need wait states for a 66MHz bus.  That is why these super-high CISC
CPU clock speeds are not geometrically increasing performance.  The
cache hit/miss ratio holds the performance.  The miss access speed stays
about the same while they try to increase the hit access speed.

What we need is a faster and *cheaper* DRAM technology!  These problems are
minor... try working with 4 CPUs and a shared, burstable memory port!
=======================================================================
Jeffrey W. Davis (317)451-0503   Domain: c23jwd@eng.delcoelect.com
Software Engineer                UUCP:   deaes!c23jwd
Delco Electronics Corporation    GM:     8-322-0503         Mail: CT40A
--- My computer NEVER cras