tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Access to DMA memory while DMA in progress?



On Fri, 27 Oct 2017, Mouse wrote:

> I would like to read the DMA buffer while DMA is still going on.  That
> is, I have a buffer of (say) 64K and the hardware is busily writing
> into it; I want to read the buffer and see what the hardware has
> written in the memory it has written and what used to be there in the
> memory it hasn't.  I'm fine if the CPU's view lags the hardware's view
> slightly, but I do care about the CPU's view of the DMA write order
> matching the hardware's: that is, if the CPU sees the value written by
> a given DMA cycle, then the CPU must also see the values written by all
> previous DMA cycles.  (This reading is being carried out from within
> the kernel, by driver code.  I might be able to move it to userland,
> but it would surprise me if userland could do something the kernel
> can't.)

This is all very hardware dependent.

Make sure you map that area with the BUS_DMA_COHERENT flag.  It will 
disable as much caching as possible on those sections of memory, and on 
some hardware may be required or the CPU won't be able to read the data 
until the segment is bus_dmamem_unmap()ped even with the bus_dmamap_sync() 
operations.

Many NICs do something like this.  They have a ring buffer the CPU sets up 
with pointers to other buffers to hold incoming packets.  When a packet 
comes in the NIC writes out the contents and then updates the pointer to 
indicate DMA completion.  The CPU then swaps the pointer with one pointing 
to an empty buffer.

> 
> But I'm not sure what sort of sync calls I need to make.  Because of
> things like bounce buffers and data caches, I presumably need
> bus_dmamap_sync(BUS_DMASYNC_POSTREAD) somewhere in the mix, but it is
> not clear to me how/when, nor how fine-grained those calls can be.  Do
> I just POSTREAD each byte/word/whatever before I read it?  How
> expensive is bus_dmamap_sync - for example, is a 1K sync significantly
> cheaper than four 256-byte syncs covering the same memory?  If I'm
> reading a bunch of (say) uint32_ts, is it reasonable to POSTREAD each
> uint32_t individually?  If I POSTREAD something that DMA hasn't written
> yet, will it work to POSTREAD it again (and then read it) after DMA
> _has_ written it?  Is BUS_DMA_STREAMING relevant?  I will be
> experimenting to see what seems to work, but I'd like to understand
> what is promised, not just what happens to work on my development
> system.
> 
> Of course, there is the risk of reading a partially-written datum.  In
> my case (aligned uint32_ts on amd64) I don't think that can happen.

You want to do a bus_dmamap_sync(BUS_DMASYNC_POSTREAD) for each... let's 
call it a snapshot.  It will try to provide the CPU a consistent view of 
that section of memory at the time the sync call is made.

The cost of these operations is very hardware dependent.  On some machines 
the bus_dmamem_map() operation with or without the BUS_DMA_COHERENT flag 
will turn off all caches and the bus_dmamap_sync() calls are noops.

On hardware that has an I/O cache, bus_dmamap_sync() may need to flush it 
first to get the DMA data into the coherency domain.

If there's a CPU cache that has not been disabled for that secion of 
memory, bus_dmamap_sync() may need to invalidate it.

In the NIC example above, you map the ring buffer with BUS_DMA_COHERENT, 
fill it up and do a bus_dmamap_sync(BUS_DMASYNC_PREREAD).  When you want 
to read it (usually after getting an interrupt) you do 
bus_dmamap_sync(BUS_DMASYNC_POSTREAD) before doing the read.

I have long argued that we should also have bus_dma accessor functions 
like the ones used by bus_dma to access device registers.  They can do fun 
things like fixing up alignment and endianness swapping without having to 
litter the driver with code only needed for certain hardware.

> The presence of bus_dmamem_mmap seems to me to imply that it should be
> possible to make simple memory accesses Just Work, but it's not clear
> to me to what extent bus_dmamem_mmap supports _concurrent_ access by
> DMA and userland (for example, does the driver have to
> BUS_DMASYNC_POSTREAD after the DMA and before userland access to
> mmapped memory, or does the equivalent happen automagically, eg in the
> page fault handler, or does bus_dmamem_mmap succeed only on systems
> where no such care needs to be taken, or what?).

Trying to do this in userland on a machine with an I/O cache won't work 
too good.

> My impression is that bus_dma is pretty stable, and, thus, version
> doesn't matter much.  But, in case it matters, 5.2 on amd64.

AFAIK amd64 disables all caches on BUS_DMAMAP_COHERENT, so the sync 
operations aren't really necessary.  But jumping through all these hoops 
is important on other hardware.

Eduardo


Home | Main Index | Thread Index | Old Index