Subject: Re: Add "last record" and "last mbuf" pointers to sockbuf
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: tech-kern
Date: 07/03/2002 14:26:33
On Wed, Jul 03, 2002 at 02:02:39PM -0700, Jonathan Stone wrote:

 >   *  Great stuff.  Let's do it.

Great!  Done!  :-)

(I checked a working version in earlier today :-)

 >   * Yes, kttcp lets you focus on just the socket/mbuf-internal portions.
 >     However, if  typical usage *does* go to userspace, kttcp
 >     will overstate the relative (percentage), but not absolute
 >     (seconds/milliseconds of walltime) speedup.     I'm sure you know
 >     this. (I'm mostly wondering if the  30% speedup on a DP83820
 >     you mentioned yesterday included [kernel<->userspace overhead, or not.)

All my before and after numbers are using kttcp right now.

Allen Briggs and I have discovered, BTW, that DP83820 cards are VERY
sensitive to being plugged into a decent bus, and also seem to be
very sensitive to the presence of other PCI cards (probably related to
how the BIOS slices up the PCI bandwidth in the LAT_TMR setting).

Yes, I realize that kttcp isn't a good model for apps which run in
userspace.  However:

	1. It still lets me focus on other issues more easily.

	2. Allen and I are both also trying to improve an application
	   which will run in the kernel :-)

 >   * Some of the the names don't fit the pre-existing mbuf code, e.g.,
 >     sbappendrecord() vs. sbappend_stream()?
 >     I think sbappendstream() is a better fit.

Hm, I guess I could nuke the _.  The difference being that sbappendrecord()
appends a new record, and sbappend_stream() is sbappend(), but for STREAM
protocols :-)

I'll go ahead and change it though.

 >     The SB_UPDATE_TAIL() macro gets used a lot, and what it
 >     does wasn't immediatley clear. I'd find the patch much easier
 >     to read/follow if this macro-name included some hint that it just
 >     checks for a null socketqueue; and if so, nulls the tail pointer.
 >     I dunno, SB_EMPTY_FIXUP() or something similar? 

Hm, yah, I could rename this, too.  I'm not too good at picking names,
and your suggestion is probably more clear than mine :-)

 >   *  Your dgram bugs: For a dgram socket,  the "tail" pointer
 >      wants to point to the head of the last *chain* (not the last mbuf
 >      in the chain).  For a pure stream socket, it wants
 >      to point to the last mbuf in the last chain.  [I need more
 >      coffee to remember how streams with record boundaries (TP4) work].
 >      I'll hazard a guess that your bugs are here, somewhere.

Yes, I realize that the tail pointer wants to know the first mbuf
of the last record in that case.  That's why I have both sb_lastrecord
and sb_mbtail.  The former is used for datagrams, the latter is used
for pure streams.

STREAM protocols with record boundaries would not be allowed to use the
new sbappendtream() .. the invariant for that function is that there will
never be more than one record in the socket buffer.  That's where all of
its speed comes from.

 >      I promise to read the patch over the weekend.

Please read the latest one I posted (which is the one I checked in).

 >   * is it worth doing a TCP `fast path' in sosend?.
 >     The historical motivation was perverse apps which did
 >     large numbers of small (3-byte) write()s on TCP sockets.
 >     It's a natural fit with keeping an explicit tail pointer.
 >     (IIRC there were certain  workloads/ benchmarks where it made
 >     a signficant win. However the motivating apps/benchmarks may be long-dead).

I would not mind having specialized sosend and soreceive functions for stream
only protocols ... i.e. ones which would never have address or control mbufs.
But the profiles don't show much time being wasted in those routines, so it
doesn't seem worth the effort, at this point.

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>