Subject: Slight tweak to tcp_output()
To: None <tech-net@netbsd.org>
From: Jason R Thorpe <thorpej@zembu.com>
List: tech-net
Date: 07/31/2001 09:15:01
--82I3+IH0IqGh5yIs
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

As some people may have noticed, I've been doing a bit of profiling
of the TCP transmit path.  One thing that popped up in my profile was
that quite a bit of time is spent in memcpy().  This is coming from
the code that builds a TCP packet from the socket buffer (this code
is now self-contained in a function called tcp_build_datapkt()).

Essentially, for all data small enough to fit in an mbuf cluster, the
data was being deep-copied from the socket buffer into the cluster.

In the benchmarks I'm using, the user is going large writes to the socket,
so the data in the socket buffer is sitting in cluster mbufs.  This means
that a shallow-copy could simply reference the data in the socket buffer,
rather than having to copy it out into another buffer.

I made this change, and I managed to squeeze another 1000KB/s or so
in my benchmarks (it was a bit less than I'd hoped for, but it's
pretty obvious to me now that I have to attack copyin()).  But before
I committed it, I wanted some feedback on it.  The patch is attached.

-- 
        -- Jason R. Thorpe <thorpej@zembu.com>

--82I3+IH0IqGh5yIs
Content-Type: text/plain; charset=us-ascii
Content-Description: tcp_output.c-patch
Content-Disposition: attachment; filename=foo

Index: tcp_output.c
===================================================================
RCS file: /cvsroot/syssrc/sys/netinet/tcp_output.c,v
retrieving revision 1.70
diff -c -r1.70 tcp_output.c
*** tcp_output.c	2001/07/31 02:25:22	1.70
--- tcp_output.c	2001/07/31 16:02:12
***************
*** 335,376 ****
  		tcpstat.tcps_sndpack++;
  		tcpstat.tcps_sndbyte += len;
  	}
! #ifdef notyet
! 	if ((m = m_copypack(so->so_snd.sb_mb, off,
! 	    (int)len, max_linkhdr + hdrlen)) == 0)
! 		return (ENOBUFS);
  	/*
! 	 * m_copypack left space for our hdr; use it.
  	 */
- 	m->m_len += hdrlen;
- 	m->m_data -= hdrlen;
- #else
  	MGETHDR(m, M_DONTWAIT, MT_HEADER);
! 	if (m != NULL &&
! 	    (max_linkhdr + hdrlen > MHLEN ||
! 	     max_linkhdr + hdrlen + len <= MCLBYTES)) {
  		MCLGET(m, M_DONTWAIT);
  		if ((m->m_flags & M_EXT) == 0) {
  			m_freem(m);
! 			m = NULL;
  		}
! 	}
! 	if (m == NULL)
! 		return (ENOBUFS);
! 	m->m_data += max_linkhdr;
! 	m->m_len = hdrlen;
! 	if (len <= M_TRAILINGSPACE(m)) {
  		m_copydata(so->so_snd.sb_mb, off, (int) len,
  		    mtod(m, caddr_t) + hdrlen);
! 		m->m_len += len;
! 	} else {
! 		m->m_next = m_copy(so->so_snd.sb_mb, off, (int) len);
! 		if (m->m_next == NULL) {
! 			m_freem(m);
! 			return (ENOBUFS);
! 		}
  	}
- #endif
  
  	*mp = m;
  	return (0);
--- 335,386 ----
  		tcpstat.tcps_sndpack++;
  		tcpstat.tcps_sndbyte += len;
  	}
! 
  	/*
! 	 * Allocate a header mbuf and put the TCP/IP headers
! 	 * into it with room enough for the link header later.
! 	 * Note we don't simply put it at the end of the mbuf,
! 	 * since we might insert IP options, etc. later on.
  	 */
  	MGETHDR(m, M_DONTWAIT, MT_HEADER);
! 	if (__predict_false(m == NULL))
! 		return (ENOBUFS);
! 	m->m_data += max_linkhdr;
! 
! 	if ((hdrlen + max_linkhdr) > MHLEN) {
  		MCLGET(m, M_DONTWAIT);
  		if ((m->m_flags & M_EXT) == 0) {
  			m_freem(m);
! 			return (ENOBUFS);
  		}
! 		m->m_data += max_linkhdr;
! 	} else if ((len + hdrlen + max_linkhdr) <= MHLEN) {
! 		/*
! 		 * If sending data small enough to fit in a single
! 		 * header mbuf, just copy into the one we already
! 		 * have.
! 		 */
  		m_copydata(so->so_snd.sb_mb, off, (int) len,
  		    mtod(m, caddr_t) + hdrlen);
! 		m->m_len = len + hdrlen;
! 
! 		*mp = m;
! 		return (0);
! 	}
! 
! 	m->m_len = hdrlen;
! 
! 	/*
! 	 * Now do a shallow-copy from the socket buffer, and point
! 	 * the header's `next' pointer at the copy.  This will hopefully
! 	 * save us some copying if the user is doing large writes into
! 	 * the socket (and thus getting clusters into the socket buffer).
! 	 */
! 	m->m_next = m_copym(so->so_snd.sb_mb, off, len, M_DONTWAIT);
! 	if (__predict_false(m->m_next == NULL)) {
! 		m_freem(m);
! 		return (ENOBUFS);
  	}
  
  	*mp = m;
  	return (0);

--82I3+IH0IqGh5yIs--