tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: avoiding bus_dmamap_sync() costs
On Thu, Jul 12, 2012 at 09:05:10PM -0500, David Young wrote:
>
> In general, it is necessary on x86 to flush the store buffer on a
> _PREREAD operation so that if we write a word to a DMA-able address and
> subsequently read the same address again, the CPU will not satisfy the
> read with store-buffer content (i.e., the word that we just wrote), but
> with the last word written at that address by any agent.
I thought that was only true for cached addresses.
On x86 uncached accesses bypass the store buffer (don't they also
flush it?)
(Ignoring the obscure non-temporay instructions etc.)
When do you care whether the read is serviced from the store buffer
or from the cache line?
You can only be interested in the value after the dma entity has read
the value being written, and updated it with a new value.
Reads serviced from the store buffer are only problems for device
registers - and no one uses cached accesses for those.
Writing to un-snooped cached addresses when dma might write to the
same cache line before the write completes will be impossible to get
right on any architecture.
Similarly it isn't going to work if dma and cpu write to the same
memory location.
> if ((status & WTX_ST_DD) == 0) {
> WM_CDTXSYNC(sc, txs->txs_lastdesc, 1,
> BUS_DMASYNC_PREREAD | BUS_DMASYNC_CLEAN);
> break;
> }
>
> And the x86 implementation of bus_dmamap_sync() would just skip the
> locked instruction if BUS_DMASYNC_CLEAN was in the flags.
My worry is that using flags to a function tends to lead to run-time
checks - whereas the requirement for these sysc/barrier is mostly
compile time. All the conditionals might take longer than the lock.
(which is what happened with the old LOCKMGR code.)
David
--
David Laight: david%l8s.co.uk@localhost
Home |
Main Index |
Thread Index |
Old Index