Subject: Re: bus_dmamap_sync not enough ?
To: Stephan Uphoff <ups@stups.com>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: tech-kern
Date: 02/13/2003 08:39:36
On Thu, Feb 13, 2003 at 10:55:55AM -0500, Stephan Uphoff wrote:

 > I guess I should stop opening interesting looking boxes ...

Heh :-)

 > 1) Some of the ugly problems that are not mentioned on the manual page for 
 > bus_dmamap_sync:

If you want to write clarifying text for the manual page, that would
be terrific.

 > a)  bus_dmamap_sync might sync additional data outside the [offset,len] 
 > interval due to
 >    cache line granularity.
 >    Let us call the whole area the bus_dmamap_sync footprint.
 >                                  ---------------------------

Yes, this is one of the reasons that it is important to use only the
sync operations which are required.

 > b) Unfortunately there is currently no machine independent way to determine the
 >    bus_dmamap_sync footprint of a given bus_dmamap_sync call.

Yah, one possible enhancement to the API might be to add a call which
takes a bus_dma_tag_t and returns the "sync footprint" (which would
essentially be the cacheline size, in most cases).  Then drivers would
make intelligent decisions about how to align data structures, etc.
based on that information.

 > c) Data in the bus_dmamap_sync footprint should not be accessed by the 
 > processor
 >    if a BUS_DMASYNC_PREREAD is in effect for the bus_dmamap_sync footprint.
 >    ( Example: XSCALE architecture?)
 > 
 >    Problems:
 >     Reading data that falls in the bus_dmamap_sync footprint might cause a 
 > cacheline to be loaded.
 >     This cacheline might persist in the cache and present outdated memory 
 > state to the CPU
 >     even after a BUS_DMASYNC_POSTREAD sync operation.
 >     The outdated state in the cacheline can be flushed by a 
 > BUS_DMASYNC_PREREAD sync.
 >     ( This could be fixed by  flushing cache lines on BUS_DMASYNC_POSTREAD)

No, I don't think this is a problem.  If you determine that the device is
going to update the memory location again, then you issue another PREREAD,
which will do the necessary non-writeback-invalidate.  This is what current
drivers do.

Basically, the CPU should not even try to access DMA memory (read *or* write)
while the device "owns" it (i.e. between PRE and POST ops).

 >     Writing data also might lead to a stale cache line - however this condition
 >     can not be reverted on a write-back cache (write through should be ok)
 >     since the write marks the cacheline as dirty. (and a flush will just save 
 > it to memory)
 >     ( This might revert dma region memory with which it shares a cache line to 
 > an earlier state)

...right, which is why the CPU should not dirty that cache line (by issuing
a write) until it knows it is safe to do so (i.e. knows the device is not
going to update that region of memory again).

 > d) DMA_READ operation on memory not currently synced as BUS_DMASYNC_PREREAD.
 > 
 >    Same problems as (c) and the following additional problem that the
 >    CPU might see some (but not all)  data from dma transfers and the CPU might 
 > not see the
 >    data in the order it was transfered by DMA.
 >    Again BUS_DMASYNC_PREREAD should synchronize.

Err, right, a DMA read which is not preceeded by a PREREAD operation is
basically incorrect use of the bus_dma API, and thus results in undefined
behavior :-)

 > e) DMA_WRITE operation on memory not currently synced as BUS_DMASYNC_PREWRITE. 
 > 
 >     
 >    DMA access could see some data written by the CPU before a 
 > BUS_DMASYNC_PREWRITE operation
 >    and it might see the data out of order.

Ditto.

 > 2) Examples of race conditions in the i82557 driver:
 > 
 > (I)
 > Adding a new receive buffer at the end of the DMA queue.
 > File i82257var.h:  
 > 	#define	FXP_INIT_RFABUF(sc, m  \
 > 	      [...]
 > 		FXP_RFASYNC((sc), __p_m,				\
 > 		    BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE);	\
 > 		/* ## a ## */
 > 		memcpy((void *)&__p_rfa->link_addr, &__v,		\
 > 		    sizeof(__v));					\
 > 		__p_rfa->rfa_control &= htole16(~FXP_RFA_CONTROL_EL);	\
 > 	        /* ## b ## */
 > 		FXP_RFASYNC((sc), __p_m,				\
 > 		    BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE);		\
 > 
 > 
 > Between /* ## a ## */ and /* ## b ## */ DMA reads might be active for the RFA 
 > and
 > for the frame data area.

Yes, and there isn't really any good way to fix this problem.  The i82557
was clearly designed with blinders on, and relies on at least two features
of the IA-32:

	1. Fully coherent memory system.
	2. Ability to atomically update a 16-bit word in memory.

Luckily, the driver is able to arrange things to that we always win the
race.  There's not really much that can be done about it, in the case
of the i82557.

 > Non-cacheable memory would do the trick but:
 > Jason R Thorpe wrote: [..] driver should not rely on BUS_DMA_COHERENT for 
 > correct operation [...]

Right, see above ... this is a "feature" of the i82557.

 > (II)
 > File i82257.c
 > fxp_rxintr(struct fxp_softc *sc)
 > 	[...]
 > 
 > 	FXP_RFASYNC(sc, m,
 > 		    BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE);
 > 
 > 		/** A **/
 > 		rxstat = le16toh(rfa->rfa_status);
 > 
 > 		if ((rxstat & FXP_RFA_STATUS_C) == 0) {
 > 			/*
 > 			 * We have processed all of the
 > 			 * receive buffers.
 > 			 */
 > 			FXP_RFASYNC(sc, m, BUS_DMASYNC_PREREAD);
 > 			return;
 > 		}
 > 
 > 		IF_DEQUEUE(&sc->sc_rxq, m);
 > 
 > 		FXP_RXBUFSYNC(sc, m, BUS_DMASYNC_POSTREAD);
 > 		/** B **/
 > 		len = le16toh(rfa->actual_size) &
 > 		    (m->m_ext.ext_size - 1);
 > 
 > 		if (len < sizeof(struct ether_header)) {
 > 			/*
 > 			 * Runt packet; drop it now.
 > 			 */
 > 			FXP_INIT_RFABUF(sc, m);
 > 			continue;
 > 		}
 > 
 > 
 > The rfa might be dma-updated at point /** A **/
 > Due to problem d) the driver might see the update of the rfa_status field but 
 > not
 > the update of the actual_size field at point /** B **/

No.  The chip does not transfer ownership of the RFA to the host (by setting
the C bit) until the entire RFA descriptor is updated (I have PCI bus analyzer
traces that confirm this :-), thus if the host sees C, then it is guaranteed
that all of the fields in the RFA are valid.  This is what all descriptor-
based messaging systems do with the "owner bit".

Since the driver does not modify the RFA memory (does not dirty cache lines),
and issues only a PREREAD, there is no chance that the CPU's view of memory
will overwrite the device's view.

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>