Subject: Re: Proposal for modification of bus_dma(9)
To: Ross Harvey <ross@teraflop.com>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 02/02/1998 19:47:24
[ With any luck, this message will actually make it out to the world.  The
  network problems at MAE West are becoming a ... large source of annoyance
  for me right now. ]

On Sun, 1 Feb 1998 05:06:45 -0800 (PST) 
 Ross Harvey <ross@teraflop.com> wrote:

 > I do want to state for the record that I think bus_dma(9) is a work
 > of exceptional merit, and that I will continue to hold that opinion
 > even if BUS_DMAMEM_NOSYNC is allowed to live.

Heh, thanks :-)  Given some comments from Jonathan Stone, I think I
may indeed allow it to live, but in a slightly different way; see below.

 > Also for the record, Avalon has not at all pressured Jason into
 > twisting the interface for our benefit.  The only actual request we
 > made was that the feature not be _used_ in a single driver project
 > Matthew Jacob was doing at NAS.

Yah, I didn't mean to imply anything other than the A12 hardware has
caused me to think about it some more :-)

 > I point out that the NOSYNC mapping is the one and only case where the
 > actual hardware buffer (possibly not quite the same as what the driver
 > _thinks_ is the buffer) must be addressable by the kernel driver.  Except
 > for NOSYNC, bus_dma would, in an absurd example that demonstrates the
 > abstraction, allow the control of a dma peripheral not even on the same
 > computer!

That's a very good point.  And I think that allowing this has value
in some interesting applications.

 > This is the first I've heard of "partial sync", and I like it. It adds
 > efficiency and it "lightens" the interface. It applies directly, for
 > example, to an issue raised by Matt Thomas in the most recent driver
 > busification.  Seems like a win.

[ Example: re-ordered stores. ]

So, here is my somewhat fleshed out idea on how to change the bus_dma
interface to deal with these problems:

	(1) Change the BUS_DMAMEM_NOSYNC flag to BUS_DMA_COHERENT.  This
	    is a *hint*, and nothing more.  It will be passed only to
	    bus_dmamem_map().  The semantics are:

		bus_dmamem_map: If possible on a given platform,
		map the memory in such a way as it will be DMA
		coherent.  This may include mapping the pages into
		unchached address space or setting the cache-inhibit
		bits in page table entries.  If a given platform cannot
		map the host RAM in a way that is DMA coherent, this flag
		is ignored.

		bus_dmamap_load*: When a dmamap is loaded, the
		machine-dependent code will take whatever action
		is necessary to determine if the memory is mapped
		in a DMA coherent way.  This may include checking
		if the KVA lies in uncached address space or if
		the page table entries have the cache-inhibited bits
		set.  If so, state is kept in the dmamap to indicate
		this to later invocations of bus_dmamap_sync().

	(2) Add the following public member to bus_dmamap_t, per
	    my previos message:

		int dm_mapsize;	The size of the current DMA mapping.
				A size of 0 indicates the mapping is
				invalid.

	    Note that dm_mapsize == 0 replaces dm_nsegs == 0 as the
	    standard way of determining of a dmamap contains a
	    valid mapping.

	(3) Change the bus_dmamap_sync() interface per my previous
	    message:

		void bus_dmamap_sync __P((bus_dma_tag_t tag,
			bus_dmamap_t dmamap, bus_addr_t offset,
			bus_size_t len, int ops));

		offset	offset into the mapping to synchronize

		len	length of mapping from offset to synchronize

		ops	one or more DMA synchronization operations

	    Valid synchronization operations:

		BUS_DMASYNC_PREREAD
		BUS_DMASYNC_PREWRITE

		BUS_DMASYNC_POSTREAD
		BUS_DMASYNC_POSTWRITE

	    Synchronization operations are expressed from the perspective
	    of the host RAM, e.g. a device -> memory operation is a READ,
	    and a memory -> device operation is a WRITE.

	    bus_dmamap_sync() may consult state within the dmamap to
	    determine if the memory is mapped in a DMA coherent way.
	    If so, bus_dmamap_sync() may elect to skip certain expensive
	    operations, such as flushing the data cache (esp. on systems
	    which cannot flush specific ranges of the cache).

	    On platforms which implement re-ordered stores, bus_dmamap_sync()
	    will always cause the store buffer to be flushed.

So, in the case of Matt's Tulip transmit descriptor (assuming all of
the transmit descriptors are mapped by a single DMA mapping, for
simplicity):

	/*
	 * Fill in the descriptor with the mapping just created for
	 * this mbuf chain.
	 *
	 * No need to POSTREAD|POSTWRITE here, since that was done
	 * when the last "transmit complete" interrupt occured for
	 * this descriptor.
	 */
	txdesc[idx].addr = dmamap->dm_segs[0].ds_addr;
	txdesc[idx].len = dmamap->dm_segs[0].ds_len;
	bus_dmamap_sync(dmat, dmamap, TXDESCOFF(idx), TXDESCSIZE,
	    BUS_DMASYNC_PREWRITE);
	txdesc[idx].flags |= TXDESC_VALID;
	bus_dmamap_sync(dmat, dmamap, TXDESCOFF(idx), TXDESCSIZE,
	    BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE);


Note that this will not cause any real additional overhead on systems
which don't need all of this song and dance, because the synchronization
would optimize out to a noop.

I can get cracking on this this week if we reach consensus, here.

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-5                                       Work: +1 650 604 0935
Moffett Field, CA 94035                             Pager: +1 415 428 6939