Subject: Re: Add "last record" and "last mbuf" pointers to sockbuf
To: Jason R Thorpe <thorpej@wasabisystems.com>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 07/03/2002 14:02:39
In message <20020701143001.E3496@dr-evil.shagadelic.org>Jason R Thorpe writes:
>
>...going back to a conversation I started a few months ago...
>
>I am planning on adding "last record" and "last mbuf" pointers to
>the sockbuf structure.  This makes insertion of new data into the
>socket buffer O(C) rather than O(N).  When you're using large socket
>buffers, this really makes a difference.

A few points, in no particular order... 

  *  Great stuff.  Let's do it.

  * Yes, kttcp lets you focus on just the socket/mbuf-internal portions.
    However, if  typical usage *does* go to userspace, kttcp
    will overstate the relative (percentage), but not absolute
    (seconds/milliseconds of walltime) speedup.     I'm sure you know
    this. (I'm mostly wondering if the  30% speedup on a DP83820
    you mentioned yesterday included [kernel<->userspace overhead, or not.)

  * Some of the the names don't fit the pre-existing mbuf code, e.g.,
    sbappendrecord() vs. sbappend_stream()?
    I think sbappendstream() is a better fit.

    The SB_UPDATE_TAIL() macro gets used a lot, and what it
    does wasn't immediatley clear. I'd find the patch much easier
    to read/follow if this macro-name included some hint that it just
    checks for a null socketqueue; and if so, nulls the tail pointer.
    I dunno, SB_EMPTY_FIXUP() or something similar? 

  *  Your dgram bugs: For a dgram socket,  the "tail" pointer
     wants to point to the head of the last *chain* (not the last mbuf
     in the chain).  For a pure stream socket, it wants
     to point to the last mbuf in the last chain.  [I need more
     coffee to remember how streams with record boundaries (TP4) work].
     I'll hazard a guess that your bugs are here, somewhere.

     I promise to read the patch over the weekend.

  * is it worth doing a TCP `fast path' in sosend?.
    The historical motivation was perverse apps which did
    large numbers of small (3-byte) write()s on TCP sockets.
    It's a natural fit with keeping an explicit tail pointer.
    (IIRC there were certain  workloads/ benchmarks where it made
    a signficant win. However the motivating apps/benchmarks may be long-dead).

  *  Again: great stuff.  Let's iron the bugs out and do it.