Re: NetBSD 5.1 TCP performance issue (lots of ACK)

To: tech-net%NetBSD.org@localhost
Subject: Re: NetBSD 5.1 TCP performance issue (lots of ACK)
From: David Young <dyoung%pobox.com@localhost>
Date: Sat, 31 Dec 2011 12:44:45 -0600

On Wed, Nov 23, 2011 at 12:30:39PM +0100, Manuel Bouyer wrote:
> On Wed, Nov 23, 2011 at 12:12:05PM +0100, Manuel Bouyer wrote:
> > On Tue, Nov 22, 2011 at 03:10:52PM -0800, Dennis Ferguson wrote:
> > > [...]
> > > You are assuming the above somehow applied to Intel CPUs which existed
> > > in 2004, but that assumption is incorrect.  There were no Intel (or AMD)
> > > CPUs which worked like that in 2004, since post-2007 manuals document the
> > > ordering behavior of all x86 models from the 386 forward, and explicitly
> > > says that none of them have reordered reads, so the above could only a
> > > statement of what they expected future CPUs might do and not what
> > > they actually did.
> > 
> > This is clearly not my experience. I can say for sure that without lfence
> > instructions, the xen front/back drivers are not working properly
> > (and I'm not the only one saying this).
> 
> To be more specific: on linux, rmb() is *not* a simple compiler barrier,
> it's either lock; addl $0,0(%%esp) or lfence depending on CPU
> target.
> smp_rmb() is defined to either barrier() (a compiler barrier) or
> rmb() when compiled with CONFIG_X86_PPRO_FENCE option.
> > 
> > > 
> > > This is clear in the post-2007 revision I have, where the section you 
> > > quote
> > > above now says:
> > 
> > It also says that we should not rely on this behavior and, for compatibility
> > with future processors programmers should use memory barrier instructions
> > where appropriate.
> > 
> > Anyway, what prompted this discussion is the added bus_dmamap_sync()
> > in thw wm driver. It's needed because:
> > - we may be using bounce buffering, and we don't know in which order
> >   the copy to bounce buffer is done
> > - all the world is not x86.
> 
> Also, the Intel manual specifies what happens between CPUs, it doesn't
> says what happens when main memory is written to by a DMA device.

The ordering question seems to be settled for write-back cached memory
that is accessed exclusively by one or more Intel CPUs: reads will not
be re-ordered on any past or present Intel CPU, so no memory barriers
are necessary to protect against said re-ordering.

I still have some doubts that the same rules apply to memory that is
uncached, write-through cached, or else memory that is write-back cached
but shared with bus-mastering peripherals.  Perhaps that is the reason
for the fence instructions to exist?

(BTW, recently I read a NetBSD kernel profile where x86_mfence and
x86_lfence together accounted for 5% of the kernel run time.  That seems
like an awful lot of time to spend on those barriers if they really are
unnecessary!)

Dave

-- 
David Young
dyoung%pobox.com@localhost    Urbana, IL    (217) 721-9981

References:
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Manuel Bouyer
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Greg Oster
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Manuel Bouyer
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: David Young
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Manuel Bouyer
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Dennis Ferguson
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Manuel Bouyer
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Dennis Ferguson
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Manuel Bouyer
- Re: NetBSD 5.1 TCP performance issue (lots of ACK)
  - From: Manuel Bouyer

Prev by Date: Re: NetBSD 5.1 TCP performance issue (lots of ACK)
Next by Date: Re: any interest in JIT for bpf and npf_ncode?
Previous by Thread: Re: NetBSD 5.1 TCP performance issue (lots of ACK)
Next by Thread: Re: NetBSD 5.1 TCP performance issue (lots of ACK)
Indexes:

Home | Main Index | Thread Index | Old Index