tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: tlp(4) DMA synchronization

On Fri, Aug 28, 2009 at 12:34:45PM -0500, David Young wrote:
> > Consider the following scenario:
> > 
> > (1) rxdescs[0].td_status in rxintr is polled and cached
> > (2) the received packet for rxdescs[0] is handled
> > (3) rxdescs[0] data in cacheline is updated for the next RX op
> >     in TULIP_INIT_RXDESC() and then the cacheline is marked dirty
> > (4) rxdescs[0] data in the cacheline is written back and invalidated
> >     by bus_dmamap_sync(9) op at the end of TULIP_INIT_RXDESC()
> > 
> > If the cachelinesize is larger than sizeof rxdescs
> > (i.e. the same cacheline also fetches rxdescs[1])
> > and rxdescs[1] for the next descriptor is being updated
> > (to clear TDSTAT_OWN) by the device between (1) and (4),
> > the updated data will be lost by the writeback op at (4).
> > We can put a PREREAD sync op before (3), but race could still
> > happen between (3) and (4) by write allocate at (3).
> That is just the scenario that I had in mind.  I think that we can use
> ring mode and avoid that scenario, if we postpone step (3) until the NIC
> is finished with the rest of the Rx descriptors in the same cacheline,
> rxdescs[1] through rxdescs[descs_per_cacheline - 1].

For RX you also need to put new buffers into the cache-line sized
block of descriptors in reverse order, making them available all together
even if an intervening cache flush happens.

TX seems rather harder, since the chip will be writing TX status while
driver is filling in later entries in the same cache line.
Ignoring TX status for all but the last active descriptor might work.

Linked mode for TX can put multiple descriptors per cache line - provided
they are for the same frame.

Presumably uncached memory could be used for the rings?
(Or double-map, cached for reads and uncached for writes!)


David Laight:

Home | Main Index | Thread Index | Old Index