Subject: Re: Experimental zero-copy for TCP and UDP transmit-side
To: Jason R Thorpe <thorpej@wasabisystems.com>
From: Bang Jun-Young <junyoung@mogua.com>
List: current-users
Date: 05/08/2002 01:05:00
On Thu, May 02, 2002 at 11:32:24AM -0700, Jason R Thorpe wrote:
> Hi folks...
> 
> I have added some experimental (and optional) code that enables page
> loaning for large (>= 4K) writes to sockets.  Combined with a TCP fix
> I committed a few days ago, this gets us to zero-copy for the TCP transmit
> side.  On tests on an embedded system with limited memory bandwith, TCP
> transmit performance on 100baseTX-FDX went from ~6500KB/s to ~11100KB/s,
> a significant improvement.
> 
> It's also worth noting that a server application that mmap's a file
> and then sends it out to the network causes the data to be moved exactly
> once: from disk to memory.  The rest of the "data movement" is done by
> VM mappings and reference counting tricks.  This could mean significant
> performance improvements for FTP, WWW, and Samba servers (NFS servers
> can't take advantage of this yet; they use a different interface, which
> I am going to address separately fairly soon).
> 
> Note that applications must be smart in order to take real advantage of
> this feature.  In particular, an application must avoid writing to a buffer
> immediately after the write(2) on the socket returns, since the loan of
> the buffer may still be in effect (writing to the buffer would then cause
> a copy-on-write fault).  Applications must also perform writes large enough
> for the loaning code to kick in.
> 
> The right way for an application to take advantage of this would be
> for it to mmap the file to be transmitted, possibly using a sliding window,
> and then write the entire window with one system call.  The sosend_loan()
> routine currently breaks it up internally into 64K chunks, which when are
> referenced directly when a TCP segment is transmitted.
> 
> Anyway, feel free to play around with this.  In fact, I encourage you to
> do so, so that any bugs can be shaken out (and if you want to experiment
> with it to improve performance, that's great, too :-) ... I obviously would
> like to make this default someday :-)

Thanks for the great work! ;-)

BTW, I have a question: what are the advantages of this over sendfile(2)
implemented in Linux and FreeBSD?

Jun-Young

-- 
Bang Jun-Young <junyoung@mogua.com>