Subject: Re: On the subject of bus_dma(9)
To: Matthew Jacob <>
From: Jason R Thorpe <>
List: tech-kern
Date: 03/07/2001 11:34:42
On Wed, Mar 07, 2001 at 10:42:10AM -0800, Matthew Jacob wrote:

 >  'flags' should be set to DDI_DMA_STREAMING if the device is doing sequential,
 >  unidirectional, block-sized, and block-aligned transfers to or from memory.
 >  The alignment and padding constraints specified by the minxfer and burstsizes
 >  fields in the DMA attribute structure, ddi_dma_attr(9S) (see
 >  ddi_dma_alloc_handle(9F)) will be used to allocate the most effective
 >  hardware support for large transfers. For example, if an I/O transfer can be
 >  sped up by using an I/O cache, which has a minimum transfer of one cache
 >  line, ddi_dma_mem_alloc() will align the memory at a cache line boundary and
 >  it will round up real_length to a multiple of the cache line size.


So, as I understand this, then, STREAMING is the opposite of "coherent".
I.e. in Solaris, "coherent" is the default, unless STREAMING is specified,
i.e. STREAMING enables the use of the I/O cache, etc. and causes additional
constraints to be placed on the memory.  (Note: I put "coherent" in quotes
because in this case it means memory->device, not cpu->memory -- we currently
assume memory->device coherency in our API).

So, what I'd like to do here, then, is propose that we clarify that COHERENT
means "coherency from the CPU to memory", and that we add a new flag,
BUS_DMA_STREAMING, to enable the use of the I/O cache, etc., (i.e. "undo
assumed memory->device coherency") as it seems to do in Solaris.

So, we would add the BUS_DMA_STREAMING flag to the following operations:

	- bus_dmamem_alloc: causes the alignment and sizes to be
	  rounded up, as necessary for the I/O cache.

	- bus_dmamap_load*: causes the I/O cache to be enabled for
	  that DMA mapping if the alignment and sizes are okay for

	- We keep the current assumption that, in the absense of
	  BUS_DMA_STREAMING, memory->device is "coherent".

	- bus_dmamap_sync() is *still* required to ensure coherency
	  (or, at least, up-to-date'ness) in the cpu->memory sense.

This would mean that e.g. chip control blocks would *not* be allocated
with BUS_DMA_STREAMING, and thus would be coherent from the memory->device
angle, but DMA maps for e.g. transfers to the SCSI bus would be loaded
with BUS_DMA_STREAMING, to get the benefit of the I/O cache.

By doing it this way, then we can ensure safe-but-maybe-not-as-fast
operation when the BUS_DMA_STREAMING flag is omitted.

How does this sound?

        -- Jason R. Thorpe <>