Subject: bus_space and barriers
To: None <tech-kern@netbsd.org>
From: Witold J. Wnuk <w.wnuk@zodiac.mimuw.edu.pl>
List: tech-kern
Date: 10/19/2000 00:18:29
The bus_space and barriers issue appeared on the list few times,
some solutions were proposed, but the problem was not fixed.
It needs to be done sooner or later. I suggest sooner. It confuses
people new to NetBSD.

Currently, bus_space_read/write_n() functions do implicit memory
barrier. Existing drivers rely on this. According to bus_space(9),
it is incorrect.

All other read/write functions are affected as well.



First, we need to decide whether to change current semantics or not.
some change pros:

        - C language intuition tells that read or write function
          should complete read or write before returning.
        
        - some programmers don't want to care about barriers

        - implicit barriers are, in fact, what many drivers need
          (in some cases hardware design may force to do barrier
          between _every_ access)

some change cons:

        - it may seem that we are changing semantics because of
          broken drivers

        - we would eventually had to fix properly written drivers
          to get speed advantage


Personally, I tend to think that semantics should be changed.
Jason Thorpe seemed to think the same.


Note that at this point no code needs to be changed, only the manual
page. It won't get worse.

If we choose to change semantics, we would have to decide how to
provide no-barrier version.

Jason Thorpe mentioned __BUS_SPACE_NO_IMPLIED_BARRIERS.
I have looked at drivers that use bus_space_barrier:

grep shows 13 "bus_space_barrier" matches in /sys/dev

ic/com.c - uses barrier once but almost certainly driver wouldn't be
correct without implicit barriers; there are only few places where
no-barrier version will have any real benefits

ic/dp8390.c - uses barriers

ic/i82586.c - uses barriers

ic/ne2000.c - uses barriers

ic/tcic2.c - does not use barriers, but has comment
        "/* XXX need bus_space_barrier here */"

isa/if_ai.c - uses barriers, defines al_read and al_write functions
        that do barriers, and then i82586.c would do barriers
        around these

isa/if_ef.c - same

isa/if_ix.c - same

pci/tga.c - uses barriers

usb/ohci.c - uses barriers before every read or write

usb/uhci.c - same

I assume that all drivers that do not use bus_space_barrier() just
ignore the need, rather then being optimized _that_ much.

So - only few drivers use barriers, and even fewer get any benefits
from doing so.

Without hardware documentation it is hard to say - but I think that
some of above don't use enough barriers (eg. between reading error
counters and zeroing them).


Considering above, I think that there is no need for bus_space_*
to depend on preprocesor define.

If semantics were changed, we could just remove all
bus_space_barrier() calls and _then_ start thinking about optimizing
parts that really need it.

Obviously, in this case, we need to have no-barrier versions.

bus_space_write_nobarrier_1 is bit long.
There is no precedence in bus_space to name it bus_space_write_nb_1.
I was thinking about something like bus_space_push_1 (or fetch or
load or store) - but I'm not native english speaker and don't know
the right word. I hope you get the idea - "write" in function name
means "do write and finish before returning" - and "push" would
mean "issue write".
What do You think?



Next, we have to clearly define barrier version semantics:

- is bus_space_write_1() equivalent to
  bus_space_write_nb_1(); bus_space_barrier()

- is bus_space_write_1() equivalent to
  bus_space_barrier(); bus_space_write_nb_1()

- is bus_space_write_1() equivalent to
  bus_space_barrier(); bus_space_write_nb_1(); bus_space_barrier()

- or is it just guaranted that any read/write function finishes
  before any following read/write function (excluding _nb_
  versions)

Note that in first three cases bus_space_write_1() may serve as
flush point for previous _nb_ writes. In forth it may not.


I would prefer forth solution because it lefts more implementation
freedom. I can imagine processor that has "st_ord" instruction
that will execute in order with and only with other *_ord
instructions.

One explicit bus_space_barrier() isn't that painful.



Implementation on current processors: barrier instruction after
reads and writes. (alternatively - before)
(in current implementation, on some platforms, I've found barriers
before and after)


Other functions: I don't think there is a need for _nb_ versions
of _multi_ or _region_ functions. They copy large amounts of data,
and one barrier (_multi_ probably requires more barriers anyway
(perhaps it is possible to avoid them on current processors?))
can't hurt (or can it? - _nb_ versions could be introduced later
- when there is a need for them).




Few words about performance: actual implementation of bus_space
functions is much more important than optimizing barriers.
For example: despite the fact that in bus_space(9)
bus_space_write_region_N() is allowed to execute writes in
any order, on many platforms there are barriers between each write.
I don't think drivers rely on such behavior.  With new semantics,
one barrier would be needed.



Comments? Proposals? Am I missing anything?



I'm willing to do as much work as I can (ie. do all necessary changes
and help integrating them) regardless of choosen solution.
I would really like to see at least updated manuals pulled
into 1.5 release.


Greetings, (and please excuse me for my poor english)


        Witold J. Wnuk