NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier



On Tue, Jun 10, 2008 at 09:20:02PM +0000, Andrew Doran wrote:
>  On Tue, Jun 10, 2008 at 06:30:00PM +0000, Manuel Bouyer wrote:
>  
>  >    the x86 implementation of bus_dmamap_sync() turn into a NOP if
>  >    we're not using bouce-buffers (which is the common case).
>  >    Yet some DMA devices rely on the read or write operations
>  >    happening in order (e.g. when the driver can add DMA descripors to
>  >    a queue while the device may also be looking at the queue).
>  >    Typical examples are the uhci and ehci drivers (but it's not
>  >    the only example in our tree). Without appropriate memory barrier,
>  >    the CPU's write buffers can reorder writes in a may that makes
>  >    the DMA descriptors in an inconsistent state from the device's view.
>  
>  Writes are never reordered by the CPU unless the target memory region has
>  been set as one of the weakly ordered types using an MTRR or PAE equivalent.
>  For write-combining regions it could happen but that should never be the
>  case for memory mapped for a device. In theory it can also happen with a
>  write-back (Intel ordering) region if you use the SSE instructions, but we
>  only use those in kernel for zeroing free pages.
>  
>  The tech docs from Intel are really confused about this issue, but the
>  memory ordering rules are (in short):
>  
>  - loads can be reordered around other loads in any direction
>  - a load after a store in program order can be reordered to occur before
>    the store
>  - a load before a store in program order _cannot_ be reordered to occur
>    after the store
>  - a store is strongly ordered with respect to all other stores
>  - locked transactions prevent all reordering and flush store buffers
>  
>  amd64 is (was?) pretty much the same, except loads are never reordered
>  around stores. 

I've been rereading chapter 7 "memory system" of the "AMD64 Architecture
Programmer's Manual Volume 2 (rev 3.13)". Actually loads can be reordered
before stores from the memory POV (which is what matters for a bus-master
device), because of the write buffers. A load can fetch the data directly
from the write buffer, before the data is actually in cache or memory.
A mfence or locked instruction is needed to prevent that.

there is also a sentence in "AMD64 Architecture Programmer's Manual Volume 3
(rev 3.13)" about lfence and mfence that I don't really understand:
"The MFENCE instruction is weakly-ordered with respect to data and instruction
prefetches.
Speculative loads initiated by the processor, or specified explicitly using
cache-prefetch instructions, can be reordered around an MFENCE."
and
"Speculative loads initiated by the processor, or specified explicitly using
cache-prefetch instructions, can be reordered around an LFENCE."

so I'm wondering if mfence/lfence is really what we need here, or
if we'd need a locked instruction.


>  > @@ -844,6 +854,24 @@
>  >            printf("unknown buffer type %d\n", cookie->id_buftype);
>  >            panic("_bus_dmamap_sync");
>  >    }
>  > +end:
>  > +  if ((ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) &&
>  > +      (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD))) {
>  > +          /* both read and write, we need a full barrier */
>  > +          x86_mfence();
>  > +  } else if (ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) {
>  > +          /*
>  > +           * all past writes should have completed before this point,
>  > +           * and futur writes should not have started yet.
>  > +           */
>  > +          x86_sfence();
>  > +  } else if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD)) {
>  > +          /*
>  > +           * all past reads should have completed before this point,
>  > +           * and futur reads should not have started yet.
>  > +           */
>  > +          x86_lfence();
>  > +  }
>  
>  I think x86_lfence() without any if() would be enough. Does it solve the
>  problem for you?

As a load can occurs before a store hit memory, I think we need
x86_mfence() for all write cases, in fact. Assuming mfence and lfence really
prevent speculative loads.

-- 
Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index