Subject: Re: Cache Chip Bug
To: None <abs@netbsd.org>
From: Chris Torek <torek@BSDI.COM>
List: port-sparc
Date: 07/07/2000 04:41:31
>How much of a performance penalty is the workaround ...

Depends how many traps you take, and how often they would have been
in the cache? :-)  (Note that the trap table occupies exactly one
page, so it is really just the 4 instructions at the beginning of
the trap that are not cached -- all traps just set up a few things
and jump to cached code.)

The original sparc chips have something like a 4 stage pipeline,
and a trap has to flush out the pipe anyway.  CPU to cache i-fetch
bandwidth is probably not much different than CPU to main memory
i-fetch bandwidth, at 20 MHz (but certainly steeper at 40 MHz on SS2's,
yet more on Weitek PowerUP).  The main penalty will be going through
the MMU, which I think relies on the cache to avoid table walks.  At
a (very rough) guess, figure those 4 instructions from the trap page
run 2x to 3x slower than they would if they were from cache.  Assuming
a typical "fast" trap (rwindow save/load) takes at least 120 cycles,
and that those 4 uncached instructions go from 4 cycles to 12 cycles,
you have gone from 120 to 128 cycles, or 6.66%.

A typical "slow" trap (e.g., syscall and return) is of course numbered
well in the thousands of cycles, so the penalty there drops way below
1%.  Of course it is exactly those "fast" (e.g., rwindow) traps that
you care about here ... luckily (?) rwindow save/load is already badly
memory-bandwidth limited, so the faster the CPU, the more cycles it
spends waiting in the (cached) rwindow code anyway, hence the smaller
the effect of the (uncached) 4 instructions in the trap page.

>and what sort of symptoms does the bug exhibit otherwise?

If I recall correctly, the cache simply delivers wrong data.  Often
this turns into an illegal instruction, so that you get a trap
during a trap, which causes a reset.  This condition can only be
caught by the ROM, and by then it is too late to do anything about
it.  In other cases, the wrong data might be a valid instruction;
who knows what will happen then.

Chris