Subject: Re: On the subject of bus_dma(9)
To: Matthew Jacob <mjacob@feral.com>
From: Jason R Thorpe <thorpej@zembu.com>
List: tech-kern
Date: 03/06/2001 15:43:23
On Tue, Mar 06, 2001 at 02:55:53PM -0800, Matthew Jacob wrote:

 > "syncing" doesn't work here. It's mapping that does the right thing. And
 > bounce buffers for a major platorm isn't the right answer. The assumption
 > that a post-mapping 'sync' operation exists is a flawed assumption.

I don't buy that -- if nothing else, you could invalidate the IOMMU PTEs
and re-validate them!  Like I said, bus_dmamap_sync() MUST be what a driver
uses to ensure "coherency" -- this has not changed since the very first
bus_dma design document (which you even provided input on -- and this didn't
come up as an issue then, and the UltraSPARC systems in question certainly
existed at that time).

I think part of the problem here is that no one has been clear *at all*
as to what the particular problems with the sparc64 even *are*.

Is there a cache between the memory and the device?  (The Sun 4/400 has
such an "I/O cache", as well.)  And you're saying that there is no way
to invalidate this cache once the IOMMU mappings have been setup?

 > What I'm going to do is to parse your mail again and update the man page with
 > mention of the ordering dependency and bus_dmamap_load_raw usages- after all,

Please let me make any manual page updates which may be necessary.  Note,
I'm saying that you *should not* be using bus_dmamap_load_raw().

 > without proper documentation things will be done in the 'wrong' order. Note
 > that doing things in the 'right' order wouldn't fix the problems on sparc64.
 > 
 > Now, you say that the correct order is:
 > 
 >         bus_dmamap_create(...);
 > 
 >         bus_dmamem_alloc(...);
 > 
 >         bus_dmamem_map(...);
 > 
 >         bus_dmamap_load(...);

Actually, if you look at *every single* bus_dma'd driver in the tree,
with the exception of a few sparc/sparc64 drivers, they all do bus_dma
operations for control blocks in this way.  Basically, it looks like what
happened is that someone (looks like it was pk :-) did the Wrong thing
with bus_dma in one Sbus driver, and it was propagated to some others.

Also, why won't this "fix" the sparc64?  bus_dmamap_load() has to query
the pmap for the physical address of the page to stuff into the IOMMU
PTEs, and while it's there, it could certainly look for the TLB's "cache
inhibit" bits, and set its own if they are set.

 > So, the assumption here is that you have to have CPU
 > (BUS_DMA_COHERENT) mapping in order to infer you want
 > an byte-coherent IOMMU mapping. This doesn't handle the
 > case of two non-cpu devices sharing memory, but that is,
 > I would agree, a bit of an edge case.

In fact, bus_dma(9) does not currently address peer-peer DMA at all.  This
is something I'm working on in my "idle loop" (along with other changes I
want to make to the bus_dma(9) interface... and since I'd like to change
it as few times as possible, I want to try and group them all together).

 > So.. follow the logic here....
 > 
 > The inference of all of this is that
 > 
 > 1. Prior to a bus_dmamap_load, a bus_dmamem_map has to be done to give the
 > 'hint' that BUS_DMA_COHERENT is to be used.

bus_dmamap_load(), as is described in the manual, requires a buffer mapped
into either kernel virtual address space (proc argument == NULL) or a user
process virtual address space (proc argument non-NULL).  The *only* loading
routine which does not require the buffer to be mapped into someone's address
space is bus_dmamamp_load_raw(), and that routine is specifically designed
to load pages that aren't in *anyone's* address space.

 > 2. But real memory is allocated (in the order above) prior to the call to
 > bus_dmamem_map.
 > 
 > 
 > 3. Therefore, bus_dmamem_alloc'd memory must be assumed to be for
 > BUS_DMA_COHERENT purposes because there may be no way to 'change'
 > the identity of the memory alloc'd in bus_dmamem_alloc (it might be
 > from a pool that *can't* be made byte-coherent).

bus_dmamem_alloc() for a particular bus_dma_tag_t *should never* allocate
from memory which can not be made to DTRT for a particular bus.  All this
does it point to a bug in the sparc64 bus_dmamem_alloc() routines.

Case in point -- bus_dmamem_alloc() on the i386, when presented with
the ISA bus_dma_tag_t, *never* allocates pages over 24M in the physical
address space.

...and in the case of the Dreamcast, bus_dmamem_alloc() merely allocates
physical pages, but the sync operations always ensure coherency (because
they copy to the PCI SRAM buffer).

-- 
        -- Jason R. Thorpe <thorpej@zembu.com>