Subject: Re: bus_space and barriers
To: "Witold J. Wnuk" <w.wnuk@zodiac.mimuw.edu.pl>
From: Chris G. Demetriou <cgd@sibyte.com>
List: tech-kern
Date: 10/18/2000 15:39:00
w.wnuk@zodiac.mimuw.edu.pl ("Witold J. Wnuk") writes:
>         - C language intuition tells that read or write function
>           should complete read or write before returning.

C language semantics also say that extra reads and writes of variables
or memory don't matter.  (unless specially marked as volatile. 8-)

Unfortunately, Hardware Does Not Act Like That.


>         - some programmers don't want to care about barriers

Programmers who want to write drivers that touch hardware _have_ to
care about e.g. write buffers.  The bus_space barriers are just an
abstraction of that.


>         - implicit barriers are, in fact, what many drivers need
>           (in some cases hardware design may force to do barrier
>           between _every_ access)

No, drivers don't _need_ implicit barriers.  They'd be at most a
convenience for the programmer.


> Personally, I tend to think that semantics should be changed.
> Jason Thorpe seemed to think the same.

I think i probably agree that barriers on should be the default, but
the only valid reason you've come close to providing for it is "some
programmers don't want to care."

In fact, it's more of an issue of robustness and common use.  Having
extra barriers hurts nothing but performance.  APIs should, in my
opinion, cater to correctness but allow for optimization.  The
existing API (which I helped create 8-) caters to optimization and
allows correctness -- but in fact all implementations do the opposite
AFAIK.


> Considering above, I think that there is no need for bus_space_*
> to depend on preprocesor define.
> 
> If semantics were changed, we could just remove all
> bus_space_barrier() calls and _then_ start thinking about optimizing
> parts that really need it.

There are definitely cases that need it.  TGA performance, for one,
will get a lot better without them (since most of the accesses don't
need them, AFAIK).


> Next, we have to clearly define barrier version semantics:
> 
> - is bus_space_write_1() equivalent to
>   bus_space_write_nb_1(); bus_space_barrier()
> 
> - is bus_space_write_1() equivalent to
>   bus_space_barrier(); bus_space_write_nb_1()
> 
> - is bus_space_write_1() equivalent to
>   bus_space_barrier(); bus_space_write_nb_1(); bus_space_barrier()
> 
> - or is it just guaranted that any read/write function finishes
>   before any following read/write function (excluding _nb_
>   versions)
> 
> Note that in first three cases bus_space_write_1() may serve as
> flush point for previous _nb_ writes. In forth it may not.

and, more importantly, which bus_space_barrier _flags_ are meant by
each.

For instance, on many alphas, there are two different types: write
(wmb) and read-write (mb).  (On the rest, the former are treated as
the latter.)

So, for instance:

	write a
	wmb
	write b
	wmb
	write c

is certainly OK, but:

	write a
	wmb
	write b
	[ X ]
	read c

may or may not be OK with "X" being mb, wmb, or nothing, depending 
on the relationship of a, b, and c, specifically whether C depends on
writes to a and b having been seen first.

There are similar issues on the read side.


> I would prefer forth solution because it lefts more implementation
> freedom. I can imagine processor that has "st_ord" instruction
> that will execute in order with and only with other *_ord
> instructions.

I'd prefer if these APIs were actually reflective of real hardware,
and truly probable real hardware.  I can imagine many things, but at
least with respect to this topic, as far as i'm concerned, if they
don't exist now they're fairly unlikely to exist in the near future.



> Other functions: I don't think there is a need for _nb_ versions
> of _multi_ or _region_ functions. They copy large amounts of data,
> and one barrier (_multi_ probably requires more barriers anyway
> (perhaps it is possible to avoid them on current processors?))
> can't hurt (or can it? - _nb_ versions could be introduced later
> - when there is a need for them).

well, things kinda break down here.  for instance, write_multi_N
pretty much _has_ to imply a write barrier on systems with write
buffers, otherwise it could end up being issued as exactly one write,
as you've noticed.  8-)

For some devices, though, there's really no point in having that
barrier at all, and barriers can be fairly expensive operations.
(e.g. if you're copying something to a frame buffer...)


> Few words about performance: actual implementation of bus_space
> functions is much more important than optimizing barriers.
> For example: despite the fact that in bus_space(9)
> bus_space_write_region_N() is allowed to execute writes in
> any order, on many platforms there are barriers between each write.
> I don't think drivers rely on such behavior.  With new semantics,
> one barrier would be needed.

Given that the ordering of write_region_N is undefined, most platforms
shouldn't need barriers between the elements, no.

In general, I don't think much effort has been put into optimizing the
region and multi variants, (i hop 8-) at least in part probably
because there's been little little effort to investigate which are
used widely and how much effort is worth _expending_.



cgd