tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: avoiding bus_dmamap_sync() costs



On Fri, Jul 13, 2012 at 07:55:11AM +0100, David Laight wrote:
> On Thu, Jul 12, 2012 at 09:05:10PM -0500, David Young wrote:
> > 
> > In general, it is necessary on x86 to flush the store buffer on a
> > _PREREAD operation so that if we write a word to a DMA-able address and
> > subsequently read the same address again, the CPU will not satisfy the
> > read with store-buffer content (i.e., the word that we just wrote), but
> > with the last word written at that address by any agent.
> 
> I thought that was only true for cached addresses.
> On x86 uncached accesses bypass the store buffer (don't they also
> flush it?)
> (Ignoring the obscure non-temporay instructions etc.)

Right, uncached accesses bypass the store buffer.

> When do you care whether the read is serviced from the store buffer
> or from the cache line?

One cares when the store buffer or the cache line is stale.  I.e.,
there's newer information either in the cache or in RAM.

> You can only be interested in the value after the dma entity has read
> the value being written, and updated it with a new value.
> Reads serviced from the store buffer are only problems for device
> registers - and no one uses cached accesses for those.

Ok, I see what you're saying.  I will have to think about that.

This all begs the question whether the bus_dmamap_sync() operations are
generally the right ones.

> Writing to un-snooped cached addresses when dma might write to the
> same cache line before the write completes will be impossible to get
> right on any architecture.

It will be impossible, and that reminds me that I was going to ask in my
previous email in this thread: is there any possibility whatsoever that
wm(4) can work on those architectures that lack DMA coherency, even if
all of the right bus_dmamap_sync()s are in place?  ISTR that when you
and I discussed this in another venue, we agreed that the answer was
"no" if the DMA descriptors were cached.

It's always bothered me that the bus_dmamem_map() flag BUS_DMA_COHERENT,
is only *advice* to the backend.

> >                 if ((status & WTX_ST_DD) == 0) {
> >                         WM_CDTXSYNC(sc, txs->txs_lastdesc, 1,
> >                             BUS_DMASYNC_PREREAD | BUS_DMASYNC_CLEAN);
> >                         break;
> >                 }
> > 
> > And the x86 implementation of bus_dmamap_sync() would just skip the
> > locked instruction if BUS_DMASYNC_CLEAN was in the flags.
> 
> My worry is that using flags to a function tends to lead to run-time
> checks - whereas the requirement for these sysc/barrier is mostly
> compile time. All the conditionals might take longer than the lock.
> (which is what happened with the old LOCKMGR code.)

On this point, somebody's advice to me (may have been yours!) was to
introduce, on those archs that will benefit, an always-inlined shim that
makes some of the _sync()s collapse at compile-time to nothing:

static __attribute__((__always_inline)) void
bus_dmamap_sync(..., int ops)
{
        /* XXX not sure if the constant-ness propagates through the
         * always-inline call.  That is, may need to use a macro, *sigh*.
         */
        if (!__builtin_constant_p(ops)) {
                _bus_dmamap_sync(..., ops);
                return;
        }
        switch (ops) {
        case BUS_DMASYNC_PREREAD|BUS_DMASYNC_CLEAN:
        case BUS_DMASYNC_PREWRITE|BUS_DMASYNC_CLEAN:
        case BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE|BUS_DMASYNC_CLEAN:
                return;
        default:
                _bus_dmamap_sync(..., ops);
                return;
        }
}

Dave

-- 
David Young
dyoung%pobox.com@localhost    Urbana, IL    (217) 721-9981


Home | Main Index | Thread Index | Old Index