Subject: Re: sendfile support in NetBSD
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: netbsd-users
Date: 02/27/2007 17:05:46
--O5XBE6gyVG5Rl6Rj
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 27, 2007 at 07:09:58PM -0500, Thor Lancelot Simon wrote:
> On Tue, Feb 27, 2007 at 06:38:58PM -0500, Perry E. Metzger wrote:
> >=20
> > I think the question was "is there an equivalent of the Linux
> > sendfile machinery". The answer is, no, NetBSD doesn't have it (though
> > FreeBSD does).
>=20
> I'm sorry, I strongly disagree with your answer.  The purpose of
> sendfile() is to send a file out a socket without requiring it to
> be copied from the application into the kernel.  But zero-copy TCP
> send already achieves that, with no need for any new API.
>=20
> Because it causes the calling process to block, sendfile() in fact is
> no more efficient than a single write() of a mmap()ed region corresponding
> to the entire file.  It is a duplicative and bogus API and I am glad it
> is not present in NetBSD, where we provide a way to get the same or
> better performance characteristics without requiring use of a nonstandard
> and ill-conceived extension.

Well, I disagree with you. :-)

I agree that the same functionality can be achieved. However there are=20
three things that I see in favor of having a sendfile() or a splice().

1) Other systems have it. So there is an advantage for portability.

2) How much code does it take to make it work? sendfile, AFAIK, just needs=
=20
file descriptors. mmap followed by write needs more code. Not much you=20
might say, but what do we do if it's a big file? On our 32-bit=20
architectures, there's a limit to the VA we can support. On our 64-bit=20
architectures, for a multi-GB file, do we really want to mmap gigabytes of=
=20
addr space just to use once for a write?

3) Using an mmap'd file for tcp source may well give us poor i/o
performance.  The problem is that all of our i/o is reactively scheduled.
It get's scheduled via faults on the mapping. It would be much more direct
to have a thread that simultaneously schedules some i/o from disk,
schedules some more, then sends the first out the tcp socket. By having an=
=20
active control, we can make better decisions about what to do.

I admit the main place I experienced this was with a server that made 64k=
=20
burst transfers, which isn't the normal sendfile usage.

Take care,

Bill

--O5XBE6gyVG5Rl6Rj
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFF5NVqWz+3JHUci9cRAiJIAJ9hg6frC/JublQ33EeZ38o+mgDNmgCeJTzw
U9qxVo09WKwnfcZjqwAEPe4=
=CfAD
-----END PGP SIGNATURE-----

--O5XBE6gyVG5Rl6Rj--