Subject: Re: On the subject of bus_dma(9)
To: Jason R Thorpe <>
From: Matthew Jacob <>
List: tech-kern
Date: 03/07/2001 08:55:33
On Tue, 6 Mar 2001, Jason R Thorpe wrote:

> On Tue, Mar 06, 2001 at 02:55:53PM -0800, Matthew Jacob wrote:
>  > "syncing" doesn't work here. It's mapping that does the right thing. And
>  > bounce buffers for a major platorm isn't the right answer. The assumption
>  > that a post-mapping 'sync' operation exists is a flawed assumption.
> I don't buy that -- if nothing else, you could invalidate the IOMMU PTEs
> and re-validate them!

You can't for things less than IOMMU page sized entities, and that's exactly
what this memory is for.

>  Like I said, bus_dmamap_sync() MUST be what a driver
> uses to ensure "coherency" -- this has not changed since the very first
> bus_dma design document (which you even provided input on -- and this didn't
> come up as an issue then, and the UltraSPARC systems in question certainly
> existed at that time).

I provided input on this, and there were many things you didn't agree with.
One of them was the Solaris DDI DMA model differences which, btw, handles this
stuff fine.

> I think part of the problem here is that no one has been clear *at all*
> as to what the particular problems with the sparc64 even *are*.
> Is there a cache between the memory and the device?  (The Sun 4/400 has
> such an "I/O cache", as well.)  And you're saying that there is no way
> to invalidate this cache once the IOMMU mappings have been setup?

I'd like us to stick to architecture rather than implementation- but in this
case the issue is whether you map a page in the IOMMU for 'streaming' or 'byte
coherent' access. It's the mapping that establishes the attribute- not any
later 'flush' operation.

>  > What I'm going to do is to parse your mail again and update the man page with
>  > mention of the ordering dependency and bus_dmamap_load_raw usages- after all,
> Please let me make any manual page updates which may be necessary.
>  Note, I'm saying that you *should not* be using bus_dmamap_load_raw().

Okay. Like I said- *I* didn't add this to isp_sbus.c- that was done by
somebody else.

> Also, why won't this "fix" the sparc64?  bus_dmamap_load() has to query
> the pmap for the physical address of the page to stuff into the IOMMU
> PTEs, and while it's there, it could certainly look for the TLB's "cache
> inhibit" bits, and set its own if they are set.

It's conceivable that in this way the implementation could follow the
architecture- if the architecture is clear and clearly documented.

If the ordering is as you said:

 >         bus_dmamap_create(...);
 >         bus_dmamem_alloc(...);
 >         bus_dmamem_map(...);
 >         bus_dmamap_load(...);

Then the requirement could conceivably met in this case for the implementation
in sparc64- that bus_dmamem_map has to be called in order to establish that
this memory must then be also byte-coherent in IOMMU mappings.

From an efficiency point of view just for this implementation, I'd want to
leave it that bus_dmamem_alloc'd memory *must* be BUS_DMA_COHERENT.

From an architectural point of view, there isn't sufficient hint as to what
*kind* of memory to allocate if you don't get the hint until *after* you
allocate the memory. You may have a platform where in order to provide byte
coherent access (and *not* require bounce, which is ridiculous, when DMA works
just fine if you program things sensibly) you have to allocate from a certain
pool. It seems wrong to not have the hint when you need it.

>  > So, the assumption here is that you have to have CPU
>  > (BUS_DMA_COHERENT) mapping in order to infer you want
>  > an byte-coherent IOMMU mapping. This doesn't handle the
>  > case of two non-cpu devices sharing memory, but that is,
>  > I would agree, a bit of an edge case.
> In fact, bus_dma(9) does not currently address peer-peer DMA at all.  This
> is something I'm working on in my "idle loop" (along with other changes I
> want to make to the bus_dma(9) interface... and since I'd like to change
> it as few times as possible, I want to try and group them all together).

Fair enough.

>  > So.. follow the logic here....
>  > 
>  > The inference of all of this is that
>  > 
>  > 1. Prior to a bus_dmamap_load, a bus_dmamem_map has to be done to give the
>  > 'hint' that BUS_DMA_COHERENT is to be used.
> bus_dmamap_load(), as is described in the manual, requires a buffer mapped
> into either kernel virtual address space (proc argument == NULL) or a user
> process virtual address space (proc argument non-NULL).  The *only* loading
> routine which does not require the buffer to be mapped into someone's address
> space is bus_dmamamp_load_raw(), and that routine is specifically designed
> to load pages that aren't in *anyone's* address space.

Okay- lacking the architectural hole about peer-peer, this is a valid
argument. The man page really needs to be clarified about this. There are
several engineers here who have lots of experience who really didn't follow
this. A good architecture is easy for reasonable engineers to follow.

>  > 2. But real memory is allocated (in the order above) prior to the call to
>  > bus_dmamem_map.
>  > 
>  > 
>  > 3. Therefore, bus_dmamem_alloc'd memory must be assumed to be for
>  > BUS_DMA_COHERENT purposes because there may be no way to 'change'
>  > the identity of the memory alloc'd in bus_dmamem_alloc (it might be
>  > from a pool that *can't* be made byte-coherent).
> bus_dmamem_alloc() for a particular bus_dma_tag_t *should never* allocate
> from memory which can not be made to DTRT for a particular bus.  All this
> does it point to a bug in the sparc64 bus_dmamem_alloc() routines.

No - this is inflammatory, and you should be ashamed for saying it.

It's not a 'bug' in that there has been genuine disagreement and uncertainty
about how this is actually supposed to work.

I would put it as I've always put it throughout this discussion as I've done
above. And this should be documented as an explicit requirement because the
logic of the *architecture* (not the games you could play with
implementation) require it.

And, btw, just because you haven't handled peer-peer- you might at least
document the issue.

So- please reconsider letting me update the man page with some clarifications.