Subject: re: Need sparc openboot reference (fwd)
To: Matthew Jacob <mjacob@feral.com>
From: Eduardo E. Horvath <eeh@one-o.com>
List: port-sparc
Date: 01/30/1999 18:12:35
On Sat, 30 Jan 1999, Matthew Jacob wrote:

> It's not quite the store ordering I get concerned about. It's read after
> the effect of a write. The SPARC model has always been (IIRC) that if you
> push a series of writes to non-primary memory address and then issue a
> read, the read stalls until all the writes drain. This is what always made
> SPARC slow relative to Alpha which does not ensure write completion w/o a
> memory barier instruction*. The other comet axioms also state that if you

In SPARC v9 there are three modes of operation:  Total Store Order (TSO),
Partial Store Order (PSO), and Relaxed Memory Order (RMO).  The last one
is new to v9 and is probably similar to the Alpha memory model.  Although
the CPU may well be implemented using comet axioms the architecture does
not guarantee them in RMO and all the manuals caution about the use of
memory barrier instructions.

> read uncached memory then all write buffers stall too (IIRC)- and I
> believe that sun4m broke this when reads to the VME space didn't
> cause pending writes to VME space to flush first (leading to deadlock in
> hardware).
> 
> At any rate, what do the use of ASI's do to this model? If you're using
> normal stores and normal reads, I would assume snoop lines in the h/w to
> ensure drains complete prior to allowing the read to proceed and that the
> read gets delivered all the way to the target h/w (not the store buffer).

There are two separate sets of membar instructions.  My reading is that
one set constrains the ordering to the coherence domain (physical cache),
while the other set is used to order memory operation completions or
access un-snoopable address ranges.

> I could see scenarios either way where ASI's ignore write buffer ordering
> or draining, or cause complete systemic write buffer draining (it would be 
> a hw platform implementation choice). I'm just asking whether this is
> known? I'm sure that we can find out- but this really tickles my funny
> bone about various projects in the past at Sun.

I think the ASIs in the UltraSPARC I and II simply override the MMU.  You
can use them to explicitly access cached physical addresses, uncached
physical addresses, or virtual addresses with interesting sideffects such
as non-faulting loads, user-level protections, or endianness inversion.  

According to the manual stores to memory regions with sideffects
(non-cacheable addresses) are strongly ordered with respect to each other
and will not be combined in the store buffer.  However, they are not well
ordered with respect to cached accesses.  I believe there are separate
store buffers for cached and sideffect stores.  

However I think the following will answer your question:

"For SPARC-V9 compatibility while in PSO or RMO mode, a MEMBAR #Lookaside
should be used between a store and a subsequent load to the same
noncacheable address."

> 
> -matt
> 
> * At Kubota a TurboChannel card was being constructed that would be a
> 100MB/s fifo graphics engine. The inability to guarantee reads not
> snooping write buffers read gave great concern- using memory barriers cut
> the 100MB/s to 25MB/s. 
> 
> 
> 

=========================================================================
Eduardo Horvath				eeh@one-o.com
	"I need to find a pithy new quote." -- me