tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: avoiding bus_dmamap_sync() costs



On Fri, Jul 13, 2012 at 11:42:57AM -0500, David Young wrote:
> 
> > Writing to un-snooped cached addresses when dma might write to the
> > same cache line before the write completes will be impossible to get
> > right on any architecture.
> 
> It will be impossible, and that reminds me that I was going to ask in my
> previous email in this thread: is there any possibility whatsoever that
> wm(4) can work on those architectures that lack DMA coherency, even if
> all of the right bus_dmamap_sync()s are in place?  ISTR that when you
> and I discussed this in another venue, we agreed that the answer was
> "no" if the DMA descriptors were cached.

I suspect it can be made to work, but it isn't worth the trouble.
You need to fill in RX descriptors by the cache-line full, and only care
about the TX status of the last active descriptor in each cache line.
Even them that will only work if the dma writes to the TX descriptors
only affect the entry being written. If the DMA is doing cache-line RMW
on the descriptors the driver would have to always setup a full cache
line (maybe fragment the tx), or wait for the outstanding TX to complete
before adding any more.

Uncached accesses are probably less hastle - and probably as fast as
the cache-flushing ops (for descriptors).

> It's always bothered me that the bus_dmamem_map() flag BUS_DMA_COHERENT,
> is only *advice* to the backend.

If it is only advice, the driver would need to know the answer.

What a lot of the code does need is something to stop gcc reordering
instructions. An asm volatile (:::"memory") does that for you.
As should marking the memory volatile.

Other CPUs do have different requirements of course, but for 'normal'
operation I suspect they are mostly:
1a) pre-setup invalidate cache for an rx buffer (allowed to write)
1b) pre-read invalidate cache for an rx buffer (only if known to be clean)
2) write back (and invalidate) cache for a tx buffer
3) write -> write ordering for uncached memory
4) read -> write ordering for uncached memory
5) read -> read ordering for uncached memory
6) write -> read ordering for uncached memory

The last 4 aren't dma operations, they are needed for pio as well.

Anything else is probably highly architecture specific.

        David

-- 
David Laight: david%l8s.co.uk@localhost


Home | Main Index | Thread Index | Old Index