Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: x86 bus_dmamap_sync



> Date: Sat, 28 Oct 2017 21:26:18 +0200
> From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> 
> It's been a while since I looked at this (but it fixed a real bug I
> could reproduce). 

No doubt!  I would just like to have a clearer statement of what
purpose the fences serve (and a clear explanation of why they're
enough).

> But I'm not sure that without *fence insctructions, two writes to the
> same location (or close locations) will be seen as two writes on the memory
> side.
> You propose to remove fence operation for *WRITE operation because
> load vs store will always happen in program order.

Almost!  I certainly propose to remove _one_ LFENCE, which I think
doesn't make any sense.  Right now, we do the following:

1. maybe read data in buffer from previous transfer
2. bus_dmamap_sync(preread)
   -> LFENCE [A] (in _bus_dmamap_sync)
3. write to register/descriptor to trigger DMA read
4. read from register/descriptor to learn of DMA completion
5. bus_dmamap_sync(postwrite)
   -> LFENCE [B] (in bus_dmamap_sync)
   -> optional memcpy(orig buffer, bounce buffer)
   -> LFENCE [C] (in _bus_dmamap_sync)
6. read data in buffer

LFENCE [B] is obviously necessary: we can't speculatively load out of
the bounce buffer until after learning of the DMA completion.

Since the memcpy in step (5) and the reading of data in step (6)
happen in the same thread, I can't imagine that there is any need to
have LFENCE [C].  So I propose to remove it.

As for LFENCE [A], I think _either_ it needs to be an MFENCE, in order
to order the loads in (1) before the store in (3), or it is
unnecessary because x86 never reorders load->store to store->load.

So here's the order that would make sense to me:

1. maybe read data out of buffer from previous transfer
2. bus_dmamap_sync(preread)
   -> MFENCE
3. write to register/descriptor to trigger DMA read
4. read from register/descriptor to learn of DMA completion
5. bus_dmamap_sync(postwrite)
   -> LFENCE [B] (in bus_dmamap_sync)
   -> optional memcpy(orig buffer, bounce buffer)
6. read data in buffer


For a DMA write, the current order is:

1. write data into buffer
2. bus_dmamap_sync(prewrite)
   -> maybe memcpy(bounce buffer, orig buffer)
   -> MFENCE [A]
3. write to register/descriptor to trigger DMA write
4. read from register/descriptor to learn of DMA completion
5. bus_dmamap_sync(postwrite)
   -> MFENCE [B]
6. maybe write to buffer for next transfer

I don't see a problem with this order, but I think it's stronger than
we need, even assuming that a fence is needed to order store->store or
load->store -- in particular, I think MFENCE [A] can just be an SFENCE
instead.


> But the problem is with store vs store.

That part I'm puzzled by.  Not saying you're wrong, but I would like
to have a clearer idea of what the model is, since the Intel and AMD
manuals for their _CPUs_, at least, say unequivocally that -- except
for non-temporal stores or write-combining memory, which are not
relevant here -- stores are never reordered.

Can anyone cite documentation about the ordering of a CPU's stores
that a DMA controller on the system bus can witness?


Home | Main Index | Thread Index | Old Index