tech-kern: RE: zbufs for NetBSD

Subject: RE: zbufs for NetBSD
To: None <thorpej@wasabisystems.com, david@l8s.co.uk>
From: None <kyle.unice@L-3com.com>
List: tech-kern
Date: 08/23/2002 07:51:37
The "zero-copy" of mbufs refers to zero copies required from user-land
receive or transmit of data on the network.  That means that when a user
land does a "recv" that the NIC recieves the packet into a section of memory
and that memory is mapped into the process space, and on a transmit the
memory is allocated once and passed all the way to the NIC for transmission
without being copied.

Zero-copy mbufs get more advantagous as the size of the MTU increases.
NetWare file server performance is a good example of how ring zero operation
of a network application can make a difference in overall system
performance. (i.e. a 486 running NetWare can support a 1000 users, whereas
other ring 3 operating systems cannot).

I know this is probably old news to most of the engineers here, but I just
wanted to make clear what is meant by zero-copy.
Kyle

-----Original Message-----
From: Jason R Thorpe [mailto:thorpej@wasabisystems.com]
Sent: Thursday, August 22, 2002 5:42 PM
To: David Laight
Cc: tech-kern@netbsd.org; kyle.unice@l-3com.com
Subject: Re: zbufs for NetBSD


On Fri, Aug 23, 2002 at 12:17:37AM +0100, David Laight wrote:

 > On Thu, Aug 22, 2002 at 01:04:37PM -0600, kyle.unice@L-3com.com wrote:
 > > I imagine that someone previously has looked into putting zbuf
(zero-copy
 > > mbufs) support into NetBSD.  I am interested in knowing what the state
of
 > > the project is.   A search of the net reveals little except for VxWorks
 > > support for zbufs.  
 > > 
 > > I would think that the MMU, socket syscalls, and mbuf code would need
to be
 > > modified.  The upside is that zbufs provide a network performance
advantage.

Hm, zero-copy mbufs.  If I understand you correctly, NetBSD has supported
those for a long time, really.

If an mbuf has "external storage" associated with it, a "copy" from
mbuf a to mbuf b merely causes mbuf b to take a reference to that
external storage.

 > For a unix system careful use of page loaning can help - but only
 > if the process side doesn't write into the loaded page (because that
 > would require a copy-on-write allocation which would end up being more
 > expensive that the original copy).

Yes.  And, in -current, NetBSD now uses "zero-copy mbufs"/page loaning
by default for writes >= 8k to a socket.

Yes, for this to be a major performance win, you need to either use
async i/o of some sort or, as you say, transmit an mmap'd file.

Zero-copy receive is somewhat harder -- the data comes in chunks of less
than one page (more or less), and so you HAVE to copy the data a little
to coalesce it into nice page-sized/page-aligned pieces.  However, once
that is done, you could certainly page-flip if given a nice page-size/page-
algned buffer for the receive.  The threshold of where this has a payoff
is something that needs to be reserched (once it's implemented, obviously
:-)

But, in any case, for sending, we're there today.

 > The big gain from page loaning probbaly comes with mmaped file - since
 > the data can be transmitted without ever getting into the cpu cache (and
 > displacing other useful data).  Hardware checksum calculation will
 > make a much bigger difference here...
 > 
 > 	David
 > 
 > -- 
 > David Laight: david@l8s.co.uk

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>