Subject: Re: sendfile support in NetBSD
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: netbsd-users
Date: 02/28/2007 10:31:43
--lc9FT7cWel8HagAv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 27, 2007 at 09:58:09PM -0500, Thor Lancelot Simon wrote:
> On Tue, Feb 27, 2007 at 05:05:46PM -0800, Bill Studenmund wrote:
> >=20
> > I agree that the same functionality can be achieved. However there are=
=20
> > three things that I see in favor of having a sendfile() or a splice().
>=20
> I certainly agree that we should have splice().
>=20
> sendfile() is an API botch.  We should not propagate it; nor should we
> mistake an _asynchronous_ file splicer for a _synchronous_ special-purpose
> hack.

I don't see how it's an API botch. The definition I saw, the one in=20
Dragonfly, looks rather sane. So please explain. You obviously feel=20
strongly about it, but you haven't said much other than express strong=20
feelings.

1) I agree that if slice() existed first, sendfile() would have had little
reason to exist. But sendfile() does exist. AFAIK everyone uses the same
function signature, so from the compatability/portability point of view,
it's a good thing to have.

We have a number of calls in our kernel which duplicate functionality and
we have others which are conveniences to the application programmer. I
agree that our zero-copy TCP send _could_ do the same thing by just
mapping a file. I however do not see that that means that sendfile() is
bad. So what if there's another way.=20

2) It's not clear to me that mmaping a whole file and feeding it to a=20
write system call scales well. For an app running in a 32-bit environment=
=20
(either 32-bit CPU or 32-bit compilation of the app), the program VM size=
=20
limits file transfer size. Also, the VM size limits the total transfer=20
size (since any files being sent at once have to be mapped into the sam VM=
=20
space).

I have a strong negative reaction to the idea of reactive i/o scheduling.=
=20
Having the VM system react to the socket system triggering page faults=20
strikes me as sub-optimal. Yes, it's better than not doing it, but if we=20
know we are going to a whole transfer, let's be as pro-active as we can.

Also, if we have tcp offload (checksumming and segmentation), I think we=20
never actually need the data in either kernel or user VM space. So why=20
build mappings, especially megabytes or gigabytes of them, when we don't=20
need them?

Take care,

Bill

--lc9FT7cWel8HagAv
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFF5cqPWz+3JHUci9cRAivzAJ9yHZtkeCdHrT52YrikmoTCH+Qd0ACfRSMa
vcyafcnWuZMfc3dUbarvNMw=
=wdTw
-----END PGP SIGNATURE-----

--lc9FT7cWel8HagAv--