Subject: Re: change bus_space_barrier() semantic
To: Garrett D'Amore <garrett_damore@tadpole.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 01/09/2006 14:55:12
On Mon, Jan 09, 2006 at 03:39:32AM -0800, Garrett D'Amore wrote:
> >Hum, then I guess there are some hardware not well described by this model,
> >although I don't examples in hand.
> > 
> >
> Not described by which model, the NetBSD one or the Solaris one?

The solaris one

> 
> Are you saying that there is a case where hardware caching would be 
> desirable, but yet merging or reordering accesses would be bad?  I can't 
> imagine this to be true, but maybe I'm not understanding what you are 
> meaning here.

With a write-thru cache, I think you can achieve read caching without
enabling write merging/reordering. Whenever this is desireable depends
on the device you're talking to.
For example, with ix(4) i think you could write to the buffer using PIO
and read it using the memory-mapped ram.

> >>do.  For example, imagine two 8 bit registers side by side.  One of them
> >>gets updated in an interrupt, and one gets updated in main line code. 
> >>You can't make this work right easily from a software perspective, but
> >>hardware can.
> >>   
> >>
> >
> >But I don't think of a hardware were this would do desireable. This is
> >typically the kind of situation where you want write order to be preserved,
> >and probably synchronous in the interrupt case.
> > 
> >
> You would want to ensure that the write completed in the interrupt case, 
> but if another register write was beside it and had been written 
> earlier, it might be benign to write them both concurrently in the 
> interrupt case.

Maybe, but probably still ordered. 

> 
> I can't think of any concrete examples of this, just hypothesizing that 
> such devices might exist.

The concrete examples I have in mind would need write ordering preserved,
to avoid race condition issues which could lead to interrupt loss.

> 
> >
> >>>For these devices (e.g. ethernet adapters with the buffer mapped in I/O 
> >>>space,
> >>>or framebuffer devices) the driver can use BUS_SPACE_MAP_PREFETCHABLE and
> >>>bus_space_barrier().
> >>>
> >>I don't think so.  If e.g. I have a wi(4), then I want to ensure that
> >>all reads actually do happen, and I care about the order of access. 
> >>What I don't want is a cache flush between every read.
> >>   
> >>
> >
> >And this is specifially the case where you don't want
> >BUS_SPACE_MAP_PREFETCHABLE. BUS_SPACE_MAP_PREFETCHABLE isn't appropriate 
> >for
> >drivers using bus_space_read/write_multi_*, but for drivers using
> >bus_space_read/write_region_*. in the _multi_ case, either the space can
> >be mapped uncached, and then there is no cache flush needed (and the
> >barrier, implicit or explicit, shouldn't issue one) 
> >
> A lot (most?) of MIPS-based ports seem to ignore the mapping and flush 
> the write cache in the barrier.  I've not  adequately checked the other 
> ports.  More about this "bug" below.
> 
> However, I think I am seeing your point now.
> 
> >or the space
> >can't be mapped uncached and the cache flush is unavoidable.
> >
> > 
> >
> >>In fact, what I'd really like to have happen is on hardware that
> >>supports it, not have a cacheable mapping to this memory.
> >>   
> >>
> >
> >This is why there is BUS_SPACE_MAP_CACHEABLE flag to bus_space_map.
> >The space won't be mapped cachable (if possible) if BUS_SPACE_MAP_CACHEABLE
> >is not specified.
> > 
> >
> So a typical PCI implementation would then have nop for barrier?  

Yes

> > 
> >
> >>If however, you can set it up so that bus_space_barrier knows whether
> >>the memory space is cached or not (e.g. by way of the flags), and is a
> >>noop for those mappings that are not cached, then this would probably be
> >>very good.
> >>   
> >>
> >
> >bus_space_barrier already has the informations for this. If it's not using 
> >it
> >it's a bug.
> > 
> >
> Okay.  Will you be fixing any implementation(s)? that suffer from this 
> defect?

I probably don't have the knowledges to fix this, but I'll look at it anyway

> 
> >>What I'm saying is that drivers that want _READ or _WRITE (and don't
> >>care to distinguish between the variations) should be able to express
> >>that simply.  (I.e. BARRIER_WRITE can be a bit field combining
> >>read-before-write and write-before-write.)
> >>   
> >>
> >
> >Ha, OK. Yes, we could keep this as a shortcut. But it's better if
> >drivers specify accurately their needs, to that hardware can acurately
> >specify what they need. I think read-before-read and write-before-write
> >are the most common cases.
> > 
> >
> I think there are a lot of cases where the driver just wants plain, 
> unordered, uncached access to device registers.

I think you mean "ordered" here.

> Accurately figuring out 
> barriers beyond that is likely to be somewhat painful for a large 
> variety of drivers.

Sure. But as most drivers are not using barrier at all right now, it's not
a problem.

> 
> >An example where PREFETCHABLE is desireable, but you still want memory
> >barrier are ethernet drivers for device which have their buffer mapped
> >(for example ix(4), which has its ram mapped in memory space).
> >To transmit a packet you copy the data to the mapped ram (with
> >bus_space_write_region, not bus_space_write_multi - on hardware where this
> >is memory-mapped linearly, it's just a memcpy, but on some hardware it may
> >have to be a bus_space_write in a loop) and then start the transmit with
> >a write to another register. You don't care if the writes to the ram are
> >reordered, they're each at different addresses. however, all these writes
> >have to be completed before the write which starts the transmit.
> >In this case it's desireable to to map the space PREFETCHABLE because the
> >data copy to/from the device's ram will be much faster, and you enforce
> >ordering of access to the control registers with a bus_space_barrier.
> > 
> >
> I agree that this kind of device may work well.  But what impact will 
> prefetch have on other register accesses to the device?  I guess the 
> driver can explictly use barriers everywhere to get correct behavior.   
> (This creates two kinds of driver -- those that use prefetch and 
> explicit barriers, and those that rely on implicit barriers.)

Yes, exactly. It's up to the device writer to choose which one to use.
Note that using implicit barriers will always be correct, just slower
in some case.

> 
> I think the documentation will need to be updated.

Yes, this was the point which started this thread: the documentation and
implementation are not in sync :)

> It would be nice if 
> the name of the flag BUS_SPACE_MAP_PREFETCHABLE somehow reflected the 
> fact that barriers were explicit vs. implicit.

Hum, I can't think of a better one. Do you have any suggestion ?

> >
> >>>I'm only talking about bus_space. Note that, *right now*, all bus_space
> >>>implementation have implicit ordered read/write, because most drivers 
> >>>don't
> >>>use bus_space_barrier() at all.
> >>>
> >>>
> >>>     
> >>>
> >>Ordering read/write is okay, but I don't want ordered read/read or
> >>write/write by default -- that would cause PIO performance to go down
> >>the drain.
> >>   
> >>
> >
> >Err, I don't understand. If you're making reference to devices like IDE
> >or wi, these read/read or write/write *have to* be ordered, as they all
> >access the same register address ! you don't want to have 2 access 
> >reordered,
> >as it would swap 2 bytes in your data stream.
> > 
> >
> Yes, they do have to be ordered.  And this works today because these are 
> located in uncached memory space, so the barrier isn't strictly 

And there are implicit barriers in the bus_space_read/write implementation
on hardware that have read prefetch/write buffer.

> needed.   More on that below, which will hopefully clear it up.
> 
> > 
> >
> >>If you look at e.g. the mips bus_space_align_chipdep.c, you find that it
> >>only does the implicit ordering for bus_space_xxxx_multi.   So I'm not
> >>sure I agree with your statement above.
> >>   
> >>
> >
> >Hum. So either mips don't need anything special for registers access 
> >barrier,
> >or some drivers will break. Most drivers assume implicit barrier.
> > 
> >
> Most device on MIPS plaforms are located in uncached memory space, so 
> the assumption is *usually* correct.
> 
> Yet the barrier operation is non-null.  One could argue reasonably that 
> this is a bug in the mips implementation.  But until this bug is fixed, 
> your proposed changes will have a bad effect on performance, at least on 
> MIPS hardware.

So I guess the barrier should check if the bus_space handle used is
mapped in cached or non-cached memory.

> 
> I wonder if other ports have similar issues.  I have not done a survey 
> of them.  Have you?

Only i386 and sparc64. On i386 no barrier are needed, on sparc64
none should be needed either, but I'm tracking down a bug in a driver
which could in fact be caused by write happening out of order.
On x86, I think there should be barrier anyway to prevent instruction
reordering at the CPU level (or possibly by the compiler, as
these are implemented as macros).
These have to be done port by port, and I don't know enouth details about
most of them.

> >
> >This I can't see why it would be slower than the current code. These
> >devices needs read/read and write/write barrier on every register
> >access.
> > 
> >
> Concrete example: wi(4) on mips.  Currently these just "work" because 
> the PCI space is mapped uncached.  But the barrier ops for that platform 
> issues a cache flush.
> 
> I guess a reasonable argument is that the platform code is busted 
> because it should know whether it is uncached or not, and do the "right" 
> thing.

Yes, clearly. This kind of detail should not be known to the Mi device
drivers. If we strictly followed the bus_space man page, right now,
wi(4) and the ide drivers should use barrier in the MI part of the
code.

> But it doesn't, and the end result of your changes will 
> seriously impact performance of these devices on (probably) all MIPS 
> platforms.  I suspect that other ports may have similar issues, but I 
> have not checked.

I didn't either. but bus_space read/write is already doing the right thing
or most driver wouldn't work properly. I think the mips port is a special
case, and I suspect it can be easily fixed.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--