port-sparc: re: Need sparc openboot reference (fwd)

Subject: re: Need sparc openboot reference (fwd)
To: None <eeh@netbsd.org>
From: Matthew Jacob <mjacob@feral.com>
List: port-sparc
Date: 01/30/1999 13:21:23
> On Sat, 30 Jan 1999, Matthew Jacob wrote:
> 
> > Yes, somebody else mentioned bypassing the MMU. Now, I'm far from
> > knowledgeable of recent Sun internal hardware but does the ASI also bypass
> > any buffered write hardware? It used to be part of the Comet axioms that
> > you could have an arbitrary number of write buffers between a CPU and the
> > device and that you wouldn't necessarily stall until you tried to read the
> > same location.
> 
> We're comparing a write to a virtual address that's mapped with the
> sideffect bit set in the TTE to a write to an explicit physical address
> bypassing all caches.  My interpretation of the _UltraSPARC_User's_Manual_
> is that all writes go to the store buffer so once it gets beyond the MMU
> all stores are treated the same way.  OTOH, if the CPU is not operating in
> the TSO memory model or is trying to access one of the internal ASIs then
> memory barrier instructions are needed to keep things sane.  This means
> that even if you write to a device you won't necessarily stall even if you
> tried to read the same location, you'll just get back the value that's in
> the write buffer.

It's not quite the store ordering I get concerned about. It's read after
the effect of a write. The SPARC model has always been (IIRC) that if you
push a series of writes to non-primary memory address and then issue a
read, the read stalls until all the writes drain. This is what always made
SPARC slow relative to Alpha which does not ensure write completion w/o a
memory barier instruction*. The other comet axioms also state that if you
read uncached memory then all write buffers stall too (IIRC)- and I
believe that sun4m broke this when reads to the VME space didn't
cause pending writes to VME space to flush first (leading to deadlock in
hardware).

At any rate, what do the use of ASI's do to this model? If you're using
normal stores and normal reads, I would assume snoop lines in the h/w to
ensure drains complete prior to allowing the read to proceed and that the
read gets delivered all the way to the target h/w (not the store buffer).

I could see scenarios either way where ASI's ignore write buffer ordering
or draining, or cause complete systemic write buffer draining (it would be 
a hw platform implementation choice). I'm just asking whether this is
known? I'm sure that we can find out- but this really tickles my funny
bone about various projects in the past at Sun.

-matt

* At Kubota a TurboChannel card was being constructed that would be a
100MB/s fifo graphics engine. The inability to guarantee reads not
snooping write buffers read gave great concern- using memory barriers cut
the 100MB/s to 25MB/s.