tech-kern: Re: change bus_space

Subject: Re: change bus_space_barrier() semantic
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Garrett D'Amore <garrett_damore@tadpole.com>
List: tech-kern
Date: 01/07/2006 09:43:48
Manuel Bouyer wrote:

>Hi,
>while looking at how bus_space_read/write can be reordered on various ports
>I've dicovered the following:
>1- not all drivers use bus_space_barrier() to ensure ordering of register
>  access
>2- because of 1, all bus_space implementation (on hardware where it's needed)
>  have implicit barrier in the read and write methods
>3- the implementation isn't in sync with the man page. The man page defines
>  5 different barrier: BUS_SPACE_BARRIER_{READ,WRITE}_BEFORE_{READ,WRITE}
>  and BUS_SPACE_BARRIER_SYNC, while the implementation has only
>  BUS_SPACE_BARRIER_READ and BUS_SPACE_BARRIER_WRITE. The man page match
>  more closely what's some hardware (e.g. sparc64) can do.
>4- the man page mention a BUS_SPACE_MAP_PREFETCHABLE flag to bus_space_map(),
>  which should "try to map the space so that accesses can be prefetched by the
>  system, and writes can be buffered". This implies that if
>  BUS_SPACE_MAP_PREFETCHABLE is not used, read and writes should stay
>  ordered and bus_space_barrier() calls should not be needed.
>  
>
Solaris provides the following options, which probably reflects what
some SPARC or x86 hardware has supported in the past.

DDI_STRICTORDER_ACC -- strict ordering, etc.
DDI_UNORDERED_OK_ACC -- CPU can reorder references
DDI_MERGING_OK_ACC -- CPU can merge consecutive stores
DDI_LOADCACHING_OK_ACC -- CPU can cache loaded data

Each of these consecutive levels implies all of the previous.  (I.e.
LOADCACHING means that the mapped region can also support any of the
previous methods.)

>5- for most hardware, a barrier is needed for *each* bus_space access.
>  
>
I disagree with this statement.  There are lot of devices I've run into
where the barriers are only needed on e.g. one register, or a certain
group of registers.

There is also the case of larger windows where e.g. I might update a
bunch of fields for a packet and only need a barrier when I finally tell
the hardware I'm doing updating fields and it can process the packet (or
disk block, or crypto buffer, or whatever.)

>  The only case where this may not be needed is hardware providing some
>  ROM or buffer space directly mappable.
>
>Because of this, I propose the following:
>- change the implementation to be in sync with the man page (that is,
>  remove BUS_SPACE_BARRIER_{READ,WRITE} and implement the 5 barriers
>  defined in the man page)
>  
>
The problem is that some ports don't have all these levels.  I think you
still want to keep the older _READ/WRITE methods, but you can implement
them in terms of the finer grained methods.

>- change the implementation so that if the space is mapped without
>  BUS_SPACE_MAP_PREFETCHABLE, bus_space_barrier() calls are not needed.
>  This may require using bus_space_barrier() in the read/write methods
>  to enforce this, and in fact I'll probably be conservative and do it this
>  way on most ports, leaving the portmaster change this to better suit
>  the hardware.
>  
>
Changing the "implementation" in this case means changing a *lot* of
implementations.  Each port has its own bus_space logic.  Not all of the
bus_spaces have a structure associated with them where you can tag how
the space was mapped.

I'm also not 100% certain about the change in semantic of
BUS_SPACE_MAP_PREFETCHABLE.

I would actually think it would be the reverse -- if you mapped it
prefetchable, then you are saying the hardware doesn't care (e.g. memory
without side effects) and you shouldn't need barriers.

I think to get this right, you need to have more flags to bus_space_map
which describe the needs of the target device (see the flags given by
Solaris above) so that the implementation can do the "right" thing.

Just randomly applying barriers to all bus space acccesses could have a
nasty impact on performance.  For a lot of hardware these barriers are
implemented as a cache flush.  Now imagine that I'm working with DMA
mapped descriptors (e.g. a typical NIC).  Instead of having a single
cache flush when I finish writing several (maybe up to a dozen?) fields
for a given descriptor, I am going to get a several dozen cache flushes.

Now since this probably happens with the interrupts blocked, it might
not be that bad -- for a uniprocessor system -- but it probably depends
on the hardware whether a cache flush is evil or not.  On SMP hardware
it seems likely that this would be very unfriendly.  (Hopefully the SMP
hardware has something to make these efficient, but in the worst case it
might require some expensive calls to require *two* caches to be flushed.)


>- update drivers to match the new implementation (probably removing
>  bus_space_barrier() calls for most of them, or converting them to
>  BUS_SPACE_MAP_PREFETCHABLE)
>  
>
Very, very few drivers can probably use BUS_SPACE_MAP_PREFETCHABLE.  
Most of these devices use bus spaces for register accesses.

In general, I like the idea of removing this logic from drivers, they
shouldn't have to worry about this (other than establishing the type of
support needed when the device is first mapped) and the bus should take
care of it.

But getting this right is going to be a big job, and require changes to
each of the ports.

Mightn't this be a good time to try to centralize some of this logic?  
Imagine instead of common bus_space layer that provides a single handle,
which exports function pointers for each of the bus space routines
(including a function for barriers.)  Then you have somewhere to store
the flags, and can probably export some "common" routines for use by
typical ports -- e.g. memory mapped busses that have very little
differences from one another.

Of course, this also implies a fair amount of coordination from the
various portmasters.

>Any comments ? Did I miss something ?
>  
>
I think I said my piece. :-)

-- 
Garrett D'Amore                          http://www.tadpolecomputer.com/
Sr. Staff Engineer          Extending the Power of 64-bit UNIX Computing
Tadpole Computer, Inc.                             Phone: (951) 325-2134