Subject: Re: bus_space and barriers
To: (Chris G. Demetriou) <cgd@sibyte.com>
From: Witold J. Wnuk <w.wnuk@zodiac.mimuw.edu.pl>
List: tech-kern
Date: 10/19/2000 01:45:57
On 18-Oct-00, 22:39:00 Chris G. Demetriou wrote:
>  w.wnuk@zodiac.mimuw.edu.pl ("Witold J. Wnuk") writes:
> >         - C language intuition tells that read or write function
> >           should complete read or write before returning.
>  
>  C language semantics also say that extra reads and writes of
>  variables
>  or memory don't matter.  (unless specially marked as volatile. 8-)
>  
>  Unfortunately, Hardware Does Not Act Like That.
>  
>  
> >         - some programmers don't want to care about barriers
>  
>  Programmers who want to write drivers that touch hardware _have_ to
>  care about e.g. write buffers.  The bus_space barriers are just an
>  abstraction of that.

Device's hardware documentation rarely reminds programmers about such
CPU issues like barriers and ordering. That's why providing abstraction
with implicit barriers seems to be good idea.

If we provide interface and say that read is read and write is write
and that they are executed in order (from bus POV), programmers
_don't have_ to care about write buffers or even know they exists.



> >         - implicit barriers are, in fact, what many drivers need
> >           (in some cases hardware design may force to do barrier
> >           between _every_ access)
>  
>  No, drivers don't _need_ implicit barriers.  They'd be at most a
>  convenience for the programmer.

Ok - barriers accompanying accesses are what programmers use in case of
many drivers



> > Personally, I tend to think that semantics should be changed.
> > Jason Thorpe seemed to think the same.
>  
>  I think i probably agree that barriers on should be the default, but
>  the only valid reason you've come close to providing for it is "some
>  programmers don't want to care."
And it is probably the main one.


>  There are definitely cases that need it.  TGA performance, for one,
>  will get a lot better without them (since most of the accesses don't
>  need them, AFAIK).
Certainly.



>  
>  
> > Next, we have to clearly define barrier version semantics:
> > 
> > - is bus_space_write_1() equivalent to
> >   bus_space_write_nb_1(); bus_space_barrier()
> > 
> > - is bus_space_write_1() equivalent to
> >   bus_space_barrier(); bus_space_write_nb_1()
> > 
> > - is bus_space_write_1() equivalent to
> >   bus_space_barrier(); bus_space_write_nb_1(); bus_space_barrier()
> > 
> > - or is it just guaranted that any read/write function finishes
> >   before any following read/write function (excluding _nb_
> >   versions)
> > 
> > Note that in first three cases bus_space_write_1() may serve as
> > flush point for previous _nb_ writes. In forth it may not.
>  
>  and, more importantly, which bus_space_barrier _flags_ are meant by
>  each.
>  
>  For instance, on many alphas, there are two different types: write
>  (wmb) and read-write (mb).  (On the rest, the former are treated as
>  the latter.)
>  
>  So, for instance:
>  
>       write a
>       wmb
>       write b
>       wmb
>       write c
>  
>  is certainly OK, but:
>  
>       write a
>       wmb
>       write b
>       [ X ]
>       read c
>  
>  may or may not be OK with "X" being mb, wmb, or nothing, depending 
>  on the relationship of a, b, and c, specifically whether C depends
>  on
>  writes to a and b having been seen first.
>  
>  There are similar issues on the read side.
It would need to be read-write barrier after read and after write
in forth case then. Alternatively before.

>  
>  
> > I would prefer forth solution because it lefts more implementation
> > freedom. I can imagine processor that has "st_ord" instruction
> > that will execute in order with and only with other *_ord
> > instructions.
>  
>  I'd prefer if these APIs were actually reflective of real hardware,
>  and truly probable real hardware.  I can imagine many things, but at
>  least with respect to this topic, as far as i'm concerned, if they
>  don't exist now they're fairly unlikely to exist in the near future.
But it is still good idea not to impose to much restrictions on
implementation. (On the other side, it may result in the same situation
as we have today - despite specification programmers may start to
rely on it serving as flush point)


> > Other functions: I don't think there is a need for _nb_ versions
> > of _multi_ or _region_ functions. They copy large amounts of data,
> > and one barrier (_multi_ probably requires more barriers anyway
> > (perhaps it is possible to avoid them on current processors?))
> > can't hurt (or can it? - _nb_ versions could be introduced later
> > - when there is a need for them).
>  
>  well, things kinda break down here.  for instance, write_multi_N
>  pretty much _has_ to imply a write barrier on systems with write
>  buffers, otherwise it could end up being issued as exactly one
>  write,
>  as you've noticed.  8-)
>  
>  For some devices, though, there's really no point in having that
>  barrier at all, and barriers can be fairly expensive operations.
>  (e.g. if you're copying something to a frame buffer...)
_region_ version will be used for that, right?



>  Given that the ordering of write_region_N is undefined, most
>  platforms
>  shouldn't need barriers between the elements, no.
>  
>  In general, I don't think much effort has been put into optimizing
>  the
>  region and multi variants, (i hop 8-) at least in part probably
>  because there's been little little effort to investigate which are
>  used widely and how much effort is worth _expending_.
Many network cards (mostly older ones though) use write_region - i think
it is worth some effort - and it won't take much.



Greetings,

        Witold J. Wnuk