Subject: Experimental zero-copy for TCP and UDP transmit-side
To: None <current-users@netbsd.org>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: current-users
Date: 05/02/2002 11:32:24
Hi folks...

I have added some experimental (and optional) code that enables page
loaning for large (>= 4K) writes to sockets.  Combined with a TCP fix
I committed a few days ago, this gets us to zero-copy for the TCP transmit
side.  On tests on an embedded system with limited memory bandwith, TCP
transmit performance on 100baseTX-FDX went from ~6500KB/s to ~11100KB/s,
a significant improvement.

It's also worth noting that a server application that mmap's a file
and then sends it out to the network causes the data to be moved exactly
once: from disk to memory.  The rest of the "data movement" is done by
VM mappings and reference counting tricks.  This could mean significant
performance improvements for FTP, WWW, and Samba servers (NFS servers
can't take advantage of this yet; they use a different interface, which
I am going to address separately fairly soon).

Note that applications must be smart in order to take real advantage of
this feature.  In particular, an application must avoid writing to a buffer
immediately after the write(2) on the socket returns, since the loan of
the buffer may still be in effect (writing to the buffer would then cause
a copy-on-write fault).  Applications must also perform writes large enough
for the loaning code to kick in.

The right way for an application to take advantage of this would be
for it to mmap the file to be transmitted, possibly using a sliding window,
and then write the entire window with one system call.  The sosend_loan()
routine currently breaks it up internally into 64K chunks, which when are
referenced directly when a TCP segment is transmitted.

Anyway, feel free to play around with this.  In fact, I encourage you to
do so, so that any bugs can be shaken out (and if you want to experiment
with it to improve performance, that's great, too :-) ... I obviously would
like to make this default someday :-)

----- Forwarded message from Jason R Thorpe <thorpej@netbsd.org> -----

To: source-changes@netbsd.org
Date: Thu,  2 May 2002 20:55:55 +0300 (EEST)
From: Jason R Thorpe <thorpej@netbsd.org>
Subject: CVS commit: syssrc/sys


Module Name:	syssrc
Committed By:	thorpej
Date:		Thu May  2 17:55:52 UTC 2002

Modified Files:
	syssrc/sys/conf: files
	syssrc/sys/kern: uipc_socket.c
	syssrc/sys/sys: socketvar.h

Log Message:
Add some experimental page-loaning for writes on sockets.  It is disabled
by default, and can be enabled by adding the SOSEND_LOAN option to your
kernel config.  The SOSEND_COUNTERS option can be used to provide some
instrumentation.

Use of this option, combined with an application that does large enough
writes, gets us zero-copy on the TCP and UDP transmit path.


To generate a diff of this commit:
cvs rdiff -r1.523 -r1.524 syssrc/sys/conf/files
cvs rdiff -r1.63 -r1.64 syssrc/sys/kern/uipc_socket.c
cvs rdiff -r1.50 -r1.51 syssrc/sys/sys/socketvar.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


----- End forwarded message -----

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>