Port-amd64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: x86 bus_dmamap_sync
> Date: Fri, 27 Oct 2017 23:08:23 +0000
> From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
>
> As noted by Mouse in a thread elsewhere, it looks like the LFENCE in
> x86's bus_dmamap_sync in the POSTREAD case is in the wrong place.
Correction: Since yamt@ committed a fix for PR 21665 in 2004, there
has been an LFENCE (or a stronger serializing instruction, on i386) in
bus_dmamap_sync proper, in the POSTREAD case, _before_ calling the
internal _bus_dmamap_sync logic. So now I am puzzled by the LFENCE
and MFENCE that bouyer@ added back in 2008 for PR 38935 -- it seems to
me MFENCE is unnecessary for PREWRITE/POSTWRITE, LFENCE is not enough
for PREREAD which I think needs an MFENCE, and a second LFENCE for
POSTREAD is redundant.
I've attached a patch with a detailed comment explaining exactly what
ordering of loads and stores I believe we need in each case, and the
weakest x86 memory barrier that is needed to guarantee that ordering.
The summary that I've provisionally convinced myself of is:
[Pre-DMA: driver is about to issue store to device register to
initiate DMA transfer.]
PREREAD MFENCE because program-prior loads must bus-precede
program-subsequent store to device register that
initiates DMA
PREWRITE nothing because x86 guarantees bus-order of stores
matches program-order of stores, and loads are
irrelevant
[Post-DMA: driver just issued load from device register to learn of
DMA completion.]
POSTREAD LFENCE because program-prior load from device register
that notifies of DMA completion must bus-precede
program-subsequent loads to read data
POSTWRITE nothing because x86 guarantees program-order
load/store is always issued as bus-order load/store,
never as bus-order store/load
It also seems to me that it would be better to use membar_sync and
membar_consumer than x86_mfence and x86_lfence, so that we can take
advantage of the patching to actually use MFENCE and LFENCE on i386
when SSE2 is available.
All that said, it is possible that the guarantees I cited above do not
apply to either (in the PREWRITE case) a store to main memory and a
store to memory-mapped I/O, or (in the POSTWRITE case) a load from
memory-mapped I/O and a store to main memory[*]. If not, PREWRITE
would need an SFENCE and POSTWRITE an MFENCE. I'd find that a little
surprising, but I'm not an x86 wizard.
[*] Certainly these guarantees _do_ apply when it is an I/O
instruction rather than a memory-mapped I/O load/store, since I/O
instructions are all serializing. Of course, in that case, no memory
barriers are needed at all.
Index: sys/arch/x86/x86/bus_dma.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/x86/bus_dma.c,v
retrieving revision 1.77
diff -p -u -r1.77 bus_dma.c
--- sys/arch/x86/x86/bus_dma.c 31 Jul 2017 19:29:19 -0000 1.77
+++ sys/arch/x86/x86/bus_dma.c 28 Oct 2017 01:30:55 -0000
@@ -813,11 +813,79 @@ _bus_dmamap_sync(bus_dma_tag_t t, bus_dm
#endif
/*
+ * Pre-DMA operations, before the caller issues a store to
+ * initiate a DMA transfer:
+ *
+ * PREREAD
+ *
+ * All stores to the DMA buffer must be completed before
+ * the caller issues a store to initiate the DMA transfer,
+ * so that we don't step on the DMA engine's toes while
+ * it's reading. No fence needed because x86 stores are
+ * always issued in program order, outside write-combining
+ * regions of memory and non-temporal stores.
+ *
+ * All loads from the DMA buffer must be completed before
+ * the caller issues a store to initiate the DMA transfer,
+ * so that the DMA transfer doesn't clobber them. Hence
+ * MFENCE, via membar_sync.
+ *
+ * (If we have to bounce, the MFENCE may not be necessary,
+ * but it doesn't hurt either, and I'd wildly guess the
+ * performance impact of an MFENCE is negligible compared
+ * to the performance impact of using a bounce buffer.)
+ *
+ * PREWRITE
+ *
+ * All stores to the DMA buffer must be completed before
+ * the caller issues a store to initiate a DMA transfer,
+ * so that the DMA engine transfers the data we want it
+ * to. No fence needed because x86 stores are always
+ * issued in program order, outside write-combining
+ * regions of memory and non-temporal stores.
+ *
+ * The DMA engine will issue no stores to the DMA buffer,
+ * so loads can happen before or after the DMA transfer
+ * without trouble. No fence needed.
+ *
+ * Post-DMA, after the caller has issued a load from a device
+ * register informing it that the DMA transfer has completed:
+ *
+ * POSTREAD
+ *
+ * No stores to the DMA buffer may be issued until after
+ * the load that informed the caller that the DMA transfer
+ * had completed. No fence needed because x86 never
+ * reorders program-order load/store to store/load.
+ *
+ * No loads from the DMA buffer must be issued until after
+ * the load that informed the caller that the DMA transfer
+ * had completed. Hence LFENCE, via membar_consumer, and
+ * be sure to do so before we bounce.
+ *
+ * POSTWRITE
+ *
+ * No stores to the DMA buffer may be issued until after
+ * the load that informed the caller that the DMA transfer
+ * had completed. No fence needed because x86 never
+ * reorders program-order load/store to store/load.
+ *
+ * The DMA engine will issue no stores to the DMA buffer,
+ * so loads can happen before or after the DMA transfer
+ * without trouble. No fence needed.
+ */
+
+ if (ops & BUS_DMASYNC_PREREAD)
+ membar_sync();
+ if (ops & BUS_DMASYNC_POSTREAD)
+ membar_consumer();
+
+ /*
* If we're not bouncing, just return; nothing to do.
*/
if (len == 0 || cookie == NULL ||
(cookie->id_flags & X86_DMA_IS_BOUNCING) == 0)
- goto end;
+ return;
switch (cookie->id_buftype) {
case X86_DMA_BUFTYPE_LINEAR:
@@ -940,21 +1008,6 @@ _bus_dmamap_sync(bus_dma_tag_t t, bus_dm
cookie->id_buftype);
break;
}
-end:
- if (ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) {
- /*
- * from the memory POV a load can be reordered before a store
- * (a load can fetch data from the write buffers, before
- * data hits the cache or memory), a mfence avoids it.
- */
- x86_mfence();
- } else if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD)) {
- /*
- * all past reads should have completed at before this point,
- * and future reads should not have started yet.
- */
- x86_lfence();
- }
}
/*
@@ -1336,9 +1389,6 @@ bus_dmamap_sync(bus_dma_tag_t t, bus_dma
return;
}
- if (ops & BUS_DMASYNC_POSTREAD)
- x86_lfence();
-
_bus_dmamap_sync(t, p, o, l, ops);
}
Home |
Main Index |
Thread Index |
Old Index