Subject: sys_sendfile.
To: 'tech-kern@netbsd.org' <tech-kern@netbsd.org>
From: Jeff Roberson <jroberson@aventail.com>
List: tech-kern
Date: 04/03/2000 16:29:00
I did a quick implementation of a Linux compatible sendfile().  For those of
you who aren't familiar with it I'll give a brief description.  Sendfile is
primarily useful for client/server applications which need to transfer files
(ftp/http/etc).  I believe apache will use it on Linux. Sendfile just copies
data from one file descriptor to another while in the kernel instead of the
application.  This is desirable to avoid the overhead of copying data from
the kernel to the application on read and again in the opposite direction on
write.  With sendfile the application needs only to open the two descriptors
and call sendfile with them the length of data and the offset from the
beginning of the source file descriptor.  

What I have done is implemented what I call the 'two copy pseudo sendfile'
which gives little performance gain.  I started off this way to make sure I
had written an interface that is compatible with Linux.  Basically I do the
same thing a user space application would.  I allocate a buffer and use
dofileread() and dofilewrite() in a loop.  

This is not exactly an optimal solution.  What I would like to do is query
the object that I'm reading from in an abstract/clean/correct manor for the
first contiguous block of memory and also how long this block is.  Then I
could pass this address and length to dofilewrite().  After this I would
like to be able to call the devices pager to page in the next block that I
need and get it's address/length.  This would have real performance gains
because you would only be copying from one (vnode|socket) to another
(vnode|socket) instead of using an intermediate buffer.  Is this the way I
should go about doing this?  Can anyone give me some advice or point me in
the right direction?  Admittedly I haven't dug very deep to find the answers
on my own, but I thought perhaps some one on the list could save me some
time.  Any help would be appreciated.

Thanks,
Jeff