Subject: RE: zbufs for NetBSD
To: None <thorpej@wasabisystems.com, david@l8s.co.uk>
From: None <kyle.unice@L-3com.com>
List: tech-kern
Date: 08/23/2002 07:51:37
The "zero-copy" of mbufs refers to zero copies required from user-land
receive or transmit of data on the network. That means that when a user
land does a "recv" that the NIC recieves the packet into a section of memory
and that memory is mapped into the process space, and on a transmit the
memory is allocated once and passed all the way to the NIC for transmission
without being copied.
Zero-copy mbufs get more advantagous as the size of the MTU increases.
NetWare file server performance is a good example of how ring zero operation
of a network application can make a difference in overall system
performance. (i.e. a 486 running NetWare can support a 1000 users, whereas
other ring 3 operating systems cannot).
I know this is probably old news to most of the engineers here, but I just
wanted to make clear what is meant by zero-copy.
Kyle
-----Original Message-----
From: Jason R Thorpe [mailto:thorpej@wasabisystems.com]
Sent: Thursday, August 22, 2002 5:42 PM
To: David Laight
Cc: tech-kern@netbsd.org; kyle.unice@l-3com.com
Subject: Re: zbufs for NetBSD
On Fri, Aug 23, 2002 at 12:17:37AM +0100, David Laight wrote:
> On Thu, Aug 22, 2002 at 01:04:37PM -0600, kyle.unice@L-3com.com wrote:
> > I imagine that someone previously has looked into putting zbuf
(zero-copy
> > mbufs) support into NetBSD. I am interested in knowing what the state
of
> > the project is. A search of the net reveals little except for VxWorks
> > support for zbufs.
> >
> > I would think that the MMU, socket syscalls, and mbuf code would need
to be
> > modified. The upside is that zbufs provide a network performance
advantage.
Hm, zero-copy mbufs. If I understand you correctly, NetBSD has supported
those for a long time, really.
If an mbuf has "external storage" associated with it, a "copy" from
mbuf a to mbuf b merely causes mbuf b to take a reference to that
external storage.
> For a unix system careful use of page loaning can help - but only
> if the process side doesn't write into the loaded page (because that
> would require a copy-on-write allocation which would end up being more
> expensive that the original copy).
Yes. And, in -current, NetBSD now uses "zero-copy mbufs"/page loaning
by default for writes >= 8k to a socket.
Yes, for this to be a major performance win, you need to either use
async i/o of some sort or, as you say, transmit an mmap'd file.
Zero-copy receive is somewhat harder -- the data comes in chunks of less
than one page (more or less), and so you HAVE to copy the data a little
to coalesce it into nice page-sized/page-aligned pieces. However, once
that is done, you could certainly page-flip if given a nice page-size/page-
algned buffer for the receive. The threshold of where this has a payoff
is something that needs to be reserched (once it's implemented, obviously
:-)
But, in any case, for sending, we're there today.
> The big gain from page loaning probbaly comes with mmaped file - since
> the data can be transmitted without ever getting into the cpu cache (and
> displacing other useful data). Hardware checksum calculation will
> make a much bigger difference here...
>
> David
>
> --
> David Laight: david@l8s.co.uk
--
-- Jason R. Thorpe <thorpej@wasabisystems.com>