Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

x86 bus_dmamap_sync

As noted by Mouse in a thread elsewhere, it looks like the LFENCE in
x86's bus_dmamap_sync in the POSTREAD case is in the wrong place.

Right now, the order of events is:

1. DMA engine writes to main memory.
2. DMA engine sets flag indicating DMA completed and interrupts CPU.
3. CPU driver interrupt routine reads flag indicating DMA completed.
4. CPU calls bus_dmamap_sync.
   ---> (a) memcpy from bounce buffer where DMA wrote data, to
            original buffer where caller wants data.
        (b) LFENCE
5. CPU reads data out of original buffer.

The DMA engine guarantees that the stores to main memory have
completed before it sets the flag indicating DMA completion.  We need
to guarantee that the CPU reads the flag indicating DMA completion
before it reads data from the buffer that the DMA engine wrote to, so
that it doesn't read any data before the DMA engine had finished
writing it.

However, if the DMA engine was writing to a bounce buffer, we memcpy
from the bounce buffer before we issue the LFENCE that orders loads,
so we may memcpy stale data.

This matters only in the case of a bounce buffer, because the LFENCE
always happens before the caller tries to read anything on the CPU.
So we probably don't often see this case, since we generally want to
minimize the use of bounce buffers anyway.

Am I missing something x86-specific here?  Is there something really
weird about x86 and LFENCE that means it is correct to issue it there,
or does the attached patch fix a low-probability memory ordering bug?

In the PREREAD case, bus_dmamap_sync doesn't have any other side
effects, so moving the LFENCE earlier in that case should make no
difference.  We still need LFENCE in the PREREAD case: without it, the
CPU may try to delay prior loads of data from the buffer until _after_
the DMA engine has already started.  In fact, maybe in that case we
actually need MFENCE, because LFENCE makes no guarantees about the
ordering of loads with the _store_ that the driver invariably issues
immediately after bus_dmamap_sync to set the DMA engine in motion.
Index: sys/arch/x86/x86/bus_dma.c
RCS file: /cvsroot/src/sys/arch/x86/x86/bus_dma.c,v
retrieving revision 1.77
diff -p -u -r1.77 bus_dma.c
--- sys/arch/x86/x86/bus_dma.c	31 Jul 2017 19:29:19 -0000	1.77
+++ sys/arch/x86/x86/bus_dma.c	27 Oct 2017 22:45:04 -0000
@@ -812,6 +812,15 @@ _bus_dmamap_sync(bus_dma_tag_t t, bus_dm
+		/*
+		 * all past reads should have completed at before this
+		 * point, and future reads including copy from the
+		 * bounce buffer should not have started yet.
+		 */
+		x86_lfence();
+	}
 	 * If we're not bouncing, just return; nothing to do.
@@ -948,12 +957,6 @@ end:
 		 * data hits the cache or memory), a mfence avoids it.
-		/*
-		 * all past reads should have completed at before this point,
-		 * and future reads should not have started yet.
-		 */
-		x86_lfence();

Home | Main Index | Thread Index | Old Index