Subject: On the subject of bus_dma(9)
To: None <tech-kern@netbsd.org>
From: Jason R Thorpe <thorpej@zembu.com>
List: tech-kern
Date: 03/06/2001 13:35:25
Sorry that it has taken me a while to really chime in on this, but
I have been pretty busy with other things.

There is currently some confusion as to how the bus_dma(9) interface
works, in particular the meaning of BUS_DMA_COHERENT, and the use of
bus_dmamem_alloc(), bus_dmamem_map(), and bus_dmamap_load_raw().

First of all, let me point out that the bus_dma(9) manual page documents
the interface, and my original bus_dma(9) paper is in the source tree
in share/doc/papers/bus_dma.

First, let me clarify BUS_DMA_COHERENT.

BUS_DMA_COHERENT is a valid flag to bus_dmamem_map() and bus_dmamem_mmap(),
as documented in the bus_dma(9) manual page.  It is not a valid flag to
any other bus_dma function, and the documentation mentions the flag only
in the description of bus_dmamem_map() and bus_dmamem_mmap().

The BUS_DMA_COHERENT flag is defined to be a hint (and only a hint) to
machine-dependent code.  The hint requests that the MD backend map the
memory into the processor address space in a DMA coherent fashion, if
possible.  This may include mapping the pages via a non-cacheable segment,
or setting cache-inhibit bits in page table or TLB entries.  The intended
use of this flag is for e.g. control blocks, which are accessed often by
both the CPU and the device.  On some architectures, the cache flushes
that might be required by the sync operations surrounding the memory access
by the CPU could be expensive -- more expensive than simply doing un-cached
access.  The COHERENT flag allows MD code to optimize this situation as it
sees fit.  For example, on the DECstation, COHERENT causes memory to be
mapped by KSEG1 (un-cached direct-mapped segment), and bus_dmamap_load()
notices KSEG1 addresses, and marks the map as "coherent".  bus_dmamap_sync()
tests for this flag, and skips cache flushes if the flag is set.

This flag is meant ONLY has a hint.  Proper operation of a device driver
MUST not reply on the presence of this flag.  All of the bus_dmamap_sync()
operations are still required, but may be optimized based on the COHERENT
hint.  The reason that it is ONLY a hint is that DMA-coherent mappings
may not be possible on any given architecture.  One example is the SEGA
Dreamcast.  The PCI bus on the Dreamcast doesn't do DMA to host memory.
Also, the SRAM buffer for the PCI bus cannot be accessed without a special
handshaking protocol.  It is therefore not possible to map DMA'able memory
on the Dreamcast in a coherent way -- it all must be bounced.

BUS_DMA_COHERENT, since it is defined only for bus_dmamem_map() and
bus_dmamem_mmap(), only applies to CPU mappings of the memory region
in question.  (Nevermind that just exactly HOW it is to be used with
bus_dmamem_mmap() isn't perfectly clear -- but that's an interface issue
in other parts of the kernel.)


Now, to address the use of bus_dmamap_load_raw().

bus_dmamap_load_raw() is an interface for loading "raw memory" allocated
with bus_dmamem_alloc() into a DMA map.  The intended use of this interface
is for memory that a device will DMA into, but which is not typically going
to be mapped into the CPU's address space.  The example I like to use is
that of a frame grabber.  Such a device may have a large amount of memory
that, (1) the kernel doens't care about at all, (2) a user process may only
care about in small chunks at any given time.  In this case, bus_dmamem_mmap()
would be used by the user process to get at individual chunks of the memory
region.

For memory which is to be mapped into an address space *and* to be
used in DMA transactions, the correct set of operations to use are:

	/* Allocate the DMA safe memory. */
	bus_dmamem_alloc(...);

	/*
	 * Map the DMA safe memory into the address space, possibly with
	 * the COHERENT hint (e.g. a control block for a network device).
	 */
	bus_dmamem_map(...);

	/* Load the control block DMA map. */
	bus_dmamap_load(...);

bus_dmamap_load_raw() was *never* intended to be used as a way to load
the DMA map for e.g. a control block.

Now, if one follows the order of operations outlined above, then the
COHERENT hint is passed down properly to the MD backend.


The use of bus_dma(9) that brought up this discussion is that of the
ISP driver's Sbus front-end.  If you look at that code, it currently
does:

	bus_dmamap_create(...);

	bus_dmamem_alloc(...);

	bus_dmamap_load_raw(...);

	bus_dmamem_map(...);

This code is incorrect.  The correct order is:

	bus_dmamap_create(...);

	bus_dmamem_alloc(...);

	bus_dmamem_map(...);

	bus_dmamap_load(...);

If you look at any PCI Ethernet driver that I've written, you will find
that this is the same order of operations that I use for a similar kind
of thing (control data for the chip).

In the case of sparc64 (which is where all the problems are, apparently),
the COHERENT bit could be used to hint that the CPU mapping of the memory
should be un-cached.  And when the map is loaded with the CPU-mapped buffer,
the IOMMU PTE bits can get the "coherent" information.  But, even if the
IOMMU didn't get the "coherent" information, bus_dmamap_sync() operations
for the ISP mailbox, which MUST happen around CPU access to the mailbox,
must flush any I/O caches that may be managed by the IOMMU.

-- 
        -- Jason R. Thorpe <thorpej@zembu.com>