tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD 5.1 TCP performance issue (lots of ACK)

On Sat, Oct 29, 2011 at 01:37:40PM -0700, Dennis Ferguson wrote:
> On 29 Oct, 2011, at 12:59 , Manuel Bouyer wrote:
> > On Fri, Oct 28, 2011 at 06:55:30PM +0100, David Laight wrote:
> >> On Fri, Oct 28, 2011 at 04:10:36PM +0200, Manuel Bouyer wrote:
> >>> Here is an updated patch. The key point to avoid the receive errors is
> >>> to do another BUS_DMASYNC after reading wrx_status, before reading the
> >>> other values to avoid reading e.g. len before status gets updated.
> >>> The errors were because of 0-len receive descriptors.
> >> 
> >> I'm not entirely clear where the mis-ordering happens. I presume the
> >> fields a volatile so gcc won't re-order them. Which seems to imply
> >> that the only problem can be the adapter writing the fields in the
> >> wrong order (unless the data is cached and spans cache lines).
> >> In that case the BUS_DMASYNC is also acting as a delay.
> > 
> > AFAIK the CPU is allowed to reorder reads. linux has a rmb() here,
> > which is an equivalent of our x86_lfence() I guess.
> > But for platforms where BUS_DMASYNC is not a simple barrier,
> > 2 BUS_DMASYNC calls are needed.
> CPUs in general are allowed to reorder reads, but Intel and AMD
> x86 CPUs in particular won't do that.  The linux rmb() expands to
> an empty asm() statement, essentially (not quite) a NOP.

I have established that in -current, at least, the compiler
doesn't reorder the reads in wm_rxintr().  People seem to disagree
whether an x86 CPU will reorder the reads. :-)

According to <,2>, x86
will reorder reads:

        ..., x86 CPUs give no ordering guarantees for loads, so the smp_mb()
        and smp_rmb() primitives expand to lock;addl.

In NetBSD-current, membar_consumer() is

                addl    $0, -4(%esp)

which resembles the x86_lfence() that bus_dmamap_sync(POSTREAD) calls,

                addl    $0, -4(%esp)

I believe that on a UP machine, the LOCK prefix in membar_consumer() is
overwritten with a NOP.  The LOCK prefix in x86_lfence() is not erased
in that way.  Is the LOCK prefix important to the proper operation of
bus_dmamap_sync() even on a UP machine?


David Young    Urbana, IL    (217) 721-9981

Home | Main Index | Thread Index | Old Index