tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD 5.1 TCP performance issue (lots of ACK)



On Fri, Oct 28, 2011 at 04:10:36PM +0200, Manuel Bouyer wrote:
> On Thu, Oct 27, 2011 at 07:51:43PM +0200, Manuel Bouyer wrote:
> > On Thu, Oct 27, 2011 at 12:00:33PM -0400, Thor Lancelot Simon wrote:
> > > It's possible this has to do with the interrupt moderation tuning.  I
> > > believe we've been pending the checkin of better values than the ones
> > > I worked out from the documentation for quite some time -- there were
> > > highly unobvious performance effects with small buffers.  Simon did
> > > a bunch of testing and concluded, as I recall, that the values used
> > > by Intel in the Linux driver were "magic" and that we should use
> > > those, not mine.
> > > 
> > > If this hasn't been adjusted to match the Linux driver, you might
> > > want to take a quick look at the values it uses and see whether
> > > they yield better small-buffer performance in your case.
> > 
> > I looked quickly at this and came up with the attached patch.
> > 
> > With this (installed on both NetBSD hosts) I get mittiged results:
> > - the NetBSD client against the linux server gets degranded and unstable
> >   performances several runs gives large variations in speed
> > - the NetBSD client against the NetBSD server gets better performances
> >   in average (but still not in the 90MB range) and also with large
> >   variations between runs
> > - the linux client against the NetBSD server gets a little boost and
> >   the speed is stll stable between runs
> > - ttcp performances between NetBSD hosts gets a little boost too,
> >   and the speed is stll stable between runs
> > 
> > But I do get Ierrs on both NetBSD hosts now, with the ttcp or glusterfs
> > test. I don't know where these errors comes from. Linux has no errors.
> > I don't think it's wm_add_rxbuf(), netstat -m and vmstat -m shows
> > no issues with mbuf allocations.
> > So I guess these are errors at the adapter level, we may need to change
> > more things to match these values.
> > Also, linux seems to be using more advanced features for these adapters,
> > this is something we may have to look at too.
> 
> Here is an updated patch. The key point to avoid the receive errors is
> to do another BUS_DMASYNC after reading wrx_status, before reading the
> other values to avoid reading e.g. len before status gets updated.
> The errors were because of 0-len receive descriptors.

Good catch!  Question, though:

> Index: sys/dev/pci/if_wm.c
> ===================================================================
> RCS file: /cvsroot/src/sys/dev/pci/if_wm.c,v
> retrieving revision 1.162.4.15
> diff -u -p -u -r1.162.4.15 if_wm.c
> --- sys/dev/pci/if_wm.c       7 Mar 2011 04:14:19 -0000       1.162.4.15
> +++ sys/dev/pci/if_wm.c       28 Oct 2011 14:03:33 -0000
> @@ -2879,11 +2907,7 @@ wm_rxintr(struct wm_softc *sc)
>                   device_xname(sc->sc_dev), i));
>  
>               WM_CDRXSYNC(sc, i, BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE);
> -
>               status = sc->sc_rxdescs[i].wrx_status;
> -             errors = sc->sc_rxdescs[i].wrx_errors;
> -             len = le16toh(sc->sc_rxdescs[i].wrx_len);
> -             vlantag = sc->sc_rxdescs[i].wrx_special;
>  
>               if ((status & WRX_ST_DD) == 0) {
>                       /*
> @@ -2892,6 +2916,14 @@ wm_rxintr(struct wm_softc *sc)
>                       WM_CDRXSYNC(sc, i, BUS_DMASYNC_PREREAD);
>                       break;
>               }

Should

>                       WM_CDRXSYNC(sc, i, BUS_DMASYNC_PREREAD);

move above 

>               if ((status & WRX_ST_DD) == 0) {

?

>                       /*
> +             /*
> +              * sync again, to make sure the values below have been read
> +              * after status.
> +              */
> +             WM_CDRXSYNC(sc, i, BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE);
> +             errors = sc->sc_rxdescs[i].wrx_errors;
> +             len = le16toh(sc->sc_rxdescs[i].wrx_len);
> +             vlantag = sc->sc_rxdescs[i].wrx_special;
>  

Dave

-- 
David Young             OJC Technologies is now Pixo
dyoung%pixotech.com@localhost     Urbana, IL   (217) 344-0444 x24


Home | Main Index | Thread Index | Old Index