Subject: Re: zbufs for NetBSD
To: David Laight <david@l8s.co.uk>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: tech-kern
Date: 08/22/2002 16:42:12
On Fri, Aug 23, 2002 at 12:17:37AM +0100, David Laight wrote:
> On Thu, Aug 22, 2002 at 01:04:37PM -0600, kyle.unice@L-3com.com wrote:
> > I imagine that someone previously has looked into putting zbuf (zero-copy
> > mbufs) support into NetBSD. I am interested in knowing what the state of
> > the project is. A search of the net reveals little except for VxWorks
> > support for zbufs.
> >
> > I would think that the MMU, socket syscalls, and mbuf code would need to be
> > modified. The upside is that zbufs provide a network performance advantage.
Hm, zero-copy mbufs. If I understand you correctly, NetBSD has supported
those for a long time, really.
If an mbuf has "external storage" associated with it, a "copy" from
mbuf a to mbuf b merely causes mbuf b to take a reference to that
external storage.
> For a unix system careful use of page loaning can help - but only
> if the process side doesn't write into the loaded page (because that
> would require a copy-on-write allocation which would end up being more
> expensive that the original copy).
Yes. And, in -current, NetBSD now uses "zero-copy mbufs"/page loaning
by default for writes >= 8k to a socket.
Yes, for this to be a major performance win, you need to either use
async i/o of some sort or, as you say, transmit an mmap'd file.
Zero-copy receive is somewhat harder -- the data comes in chunks of less
than one page (more or less), and so you HAVE to copy the data a little
to coalesce it into nice page-sized/page-aligned pieces. However, once
that is done, you could certainly page-flip if given a nice page-size/page-
algned buffer for the receive. The threshold of where this has a payoff
is something that needs to be reserched (once it's implemented, obviously :-)
But, in any case, for sending, we're there today.
> The big gain from page loaning probbaly comes with mmaped file - since
> the data can be transmitted without ever getting into the cpu cache (and
> displacing other useful data). Hardware checksum calculation will
> make a much bigger difference here...
>
> David
>
> --
> David Laight: david@l8s.co.uk
--
-- Jason R. Thorpe <thorpej@wasabisystems.com>