tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wm(4) and the maximum buffer length for TSO



On Thu, Sep 20, 2012 at 09:12:00AM +0200, Manuel Bouyer wrote:
> On Wed, Sep 19, 2012 at 06:34:15PM -0500, David Young wrote:
> > wm(4) sets up its Tx DMA maps like this,
> > 
> >                 if ((error = bus_dmamap_create(sc->sc_dmat, WM_MAXTXDMA,
> >                             WM_NTXSEGS, WTX_MAX_LEN, 0, 0,
> >                             &sc->sc_txsoft[i].txs_dmamap)) != 0) {
> > 
> > WM_MAXTXDMA is round_page(IP_MAXPACKET) == round_page(65535) ==
> > 65536.  Thus wm(4) will fail to map for Tx any mbuf whose m_pkthdr.len
> > > 65536.  That's ok if tcp_output() produces a buffer no longer
> > than 65536 bytes for the NIC to segment, but in practice it will
> > produce a longer buffer because first it clamps the length to
> > IP_MAXPACKET,
> > 
> >                 if (use_tso) {
> >                         /*
> >                          * Truncate TSO transfers to IP_MAXPACKET, and make
> >                          * sure that we send equal size transfers down the
> >                          * stack (rather than big-small-big-small-...).
> >                          */
> > #ifdef INET6
> >                         CTASSERT(IPV6_MAXPACKET == IP_MAXPACKET);
> > #endif
> >                         len = (min(len, IP_MAXPACKET) / txsegsize) * 
> > txsegsize;
> > 
> >     ...
> > 
> > and then it adds in the combined length of the IP and TCP headers:
> > 
> >         m->m_pkthdr.len = hdrlen + len;
> > 
> > In this way, wm(4) can see m->m_pkthdr.len greater than 65536 and fail
> > to map m.  It will send no feedback to TCP to stop trying to send such
> > long un-segmented buffers.  Also, it looks to me like it will retry
> > forever to map the same mbuf for DMA---that matches the misbehavior that
> > we're seeing at $DAYJOB, where the wm(4) ceases to transmit anything.
> 
> I's strange that I didn't run into this, I do use TSO with wm(4).
> I guess that the problem is dependant on the value of txsegsize:
> if its value is right, len will be rounded down and there is
> enough space for the header.

I rounded 65535 using several values of txsegsize that I thought were
likely, but none seemed to yield len that wasn't substantially less than
65535, so there should have been plenty of room for a header.  I'll have
to log the txsegsize sometime and see what is actually chosen.

> > and tcp_output() should be very careful not to send a buffer any
> > longer than what is supported.  What do you think?
> 
> I think we need something like that.

Ok.  It seems that tcp_output() should be rejiggered to compute the
hdrlen before finalizing len (if that is possible) and then to clamp len
at (IP_MAXPACKET - hdrlen).  I guess that we can prevail upon the driver
to create a DMA map whose size is IP_MAXPACKET + ethernet header length
+ other encapsulation overhead (e.g., VLAN tag), but I *think* I would
prefer that all of the encapsulation overhead between TCP and the wire
was conveyed to TCP.  Maybe in the TCP/IP Stack Overhaul of Our Dreams,
we can do something like that.

Dave

-- 
David Young
dyoung%pobox.com@localhost    Urbana, IL    (217) 721-9981


Home | Main Index | Thread Index | Old Index