Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier

To: port-i386-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,bouyer%antioche.eu.org@localhost
Subject: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Thu, 12 Jun 2008 11:50:03 +0000 (UTC)

The following reply was made to PR port-i386/38935; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-i386-maintainer%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
        netbsd-bugs%NetBSD.org@localhost
Subject: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Date: Thu, 12 Jun 2008 13:48:50 +0200

 On Tue, Jun 10, 2008 at 09:20:02PM +0000, Andrew Doran wrote:
 >  On Tue, Jun 10, 2008 at 06:30:00PM +0000, Manuel Bouyer wrote:
 >  
 >  >   the x86 implementation of bus_dmamap_sync() turn into a NOP if
 >  >   we're not using bouce-buffers (which is the common case).
 >  >   Yet some DMA devices rely on the read or write operations
 >  >   happening in order (e.g. when the driver can add DMA descripors to
 >  >   a queue while the device may also be looking at the queue).
 >  >   Typical examples are the uhci and ehci drivers (but it's not
 >  >   the only example in our tree). Without appropriate memory barrier,
 >  >   the CPU's write buffers can reorder writes in a may that makes
 >  >   the DMA descriptors in an inconsistent state from the device's view.
 >  
 >  Writes are never reordered by the CPU unless the target memory region has
 >  been set as one of the weakly ordered types using an MTRR or PAE equivalent.
 >  For write-combining regions it could happen but that should never be the
 >  case for memory mapped for a device. In theory it can also happen with a
 >  write-back (Intel ordering) region if you use the SSE instructions, but we
 >  only use those in kernel for zeroing free pages.
 >  
 >  The tech docs from Intel are really confused about this issue, but the
 >  memory ordering rules are (in short):
 >  
 >  - loads can be reordered around other loads in any direction
 >  - a load after a store in program order can be reordered to occur before
 >    the store
 >  - a load before a store in program order _cannot_ be reordered to occur
 >    after the store
 >  - a store is strongly ordered with respect to all other stores
 >  - locked transactions prevent all reordering and flush store buffers
 >  
 >  amd64 is (was?) pretty much the same, except loads are never reordered
 >  around stores. 
 
 I've been rereading chapter 7 "memory system" of the "AMD64 Architecture
 Programmer's Manual Volume 2 (rev 3.13)". Actually loads can be reordered
 before stores from the memory POV (which is what matters for a bus-master
 device), because of the write buffers. A load can fetch the data directly
 from the write buffer, before the data is actually in cache or memory.
 A mfence or locked instruction is needed to prevent that.
 
 there is also a sentence in "AMD64 Architecture Programmer's Manual Volume 3
 (rev 3.13)" about lfence and mfence that I don't really understand:
 "The MFENCE instruction is weakly-ordered with respect to data and instruction
 prefetches.
 Speculative loads initiated by the processor, or specified explicitly using
 cache-prefetch instructions, can be reordered around an MFENCE."
 and
 "Speculative loads initiated by the processor, or specified explicitly using
 cache-prefetch instructions, can be reordered around an LFENCE."
 
 so I'm wondering if mfence/lfence is really what we need here, or
 if we'd need a locked instruction.
 
 
 >  > @@ -844,6 +854,24 @@
 >  >           printf("unknown buffer type %d\n", cookie->id_buftype);
 >  >           panic("_bus_dmamap_sync");
 >  >   }
 >  > +end:
 >  > + if ((ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) &&
 >  > +     (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD))) {
 >  > +         /* both read and write, we need a full barrier */
 >  > +         x86_mfence();
 >  > + } else if (ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) {
 >  > +         /*
 >  > +          * all past writes should have completed before this point,
 >  > +          * and futur writes should not have started yet.
 >  > +          */
 >  > +         x86_sfence();
 >  > + } else if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD)) {
 >  > +         /*
 >  > +          * all past reads should have completed before this point,
 >  > +          * and futur reads should not have started yet.
 >  > +          */
 >  > +         x86_lfence();
 >  > + }
 >  
 >  I think x86_lfence() without any if() would be enough. Does it solve the
 >  problem for you?
 
 As a load can occurs before a store hit memory, I think we need
 x86_mfence() for all write cases, in fact. Assuming mfence and lfence really
 prevent speculative loads.
 
 -- 
 Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
      NetBSD: 26 ans d'experience feront toujours la difference
 --

Follow-Ups:
- Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
  - From: Manuel Bouyer

Prev by Date: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Next by Date: Re: kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging machine
Previous by Thread: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Next by Thread: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Indexes:

Home | Main Index | Thread Index | Old Index