Subject: bpf & mmap..
To: None <tech-net@netbsd.org>
From: Darren Reed <avalon@caligula.anu.edu.au>
List: tech-net
Date: 05/02/2004 10:32:38
With it now being possible to configure bpf buffer sizes at 1MB or more,
by default, I've had a quick look at what would be required to make it
support mmap and gone on to consider, this a little bit more in terms
of implementation cost.

First, writing bpfmmap seems to be ~12 lines of code or less if you
make changes to bpfdesc.h and elsewhere in bpf.c to put both buffers
in variables aside from bd_sbuf/bd_fbuf/bd_hbuf.

The problem then becomes instrumentation of telling a program what is
valid within that buffer to use.

If we look at how bpf currently works, we have 2 buffers, size X.
With the current strategy, a program is entitled to have backup
buffering equal to 2*X for packet data but can currently only read
an amount X at a time.  However, as soon as the read is complete,
the kernel again has 2*X to fill with data.  This falls down a bit
if the read comes out of the hold buffer as the kernel only has a
single buffer of size X to fill until the read completes.  This
can be mitigated, slightly, by allowing a read for 2*X to extract
all data available from both buffers.  This lets an application
completely drain the bpf buffers with a single read and gives the
kernel 2*X buffering, at all times, between read calls to bpf.  Am
I convinced this is a win?  Not completely.  If it takes a program
using bpf time t to process a buffer of size X, then it will take
2*t to do 2*X and so whilst the read gives the kernel, 'instantly',
more capacity to store packets, the application can't service it
any quicker.  This suggests that the only saving is cutting the
number of system calls, in two, to read 2*X from bpf.  See below
for a patch that I think will make bpfread support this.

To use mmap with bpf, a new strategy is required.  If we are to use
a model where (say) an ioctl is used to find out how much data is in
the mmap buffer, then between that system call being placed, and
another to tell the kernel a program has finished with it, the kernel
only has 1*X buffer to fill with data.  i.e. in order to obtain the
same level of packet buffering as you get with read, I believe another
buffer of size X would be required.  So rather than rotate buffers
between free/hold/storage, you need to have free/hold/storage and
map-in-use to achieve the same level of buffering (or at least that
is my view, anyway.)  For the sake of system calls, I'll assume that
the same one that you use to find out information about the next
buffer to use is the same as the one you use to tell the kernel when
you are finished with the one you currently have.

To summarise, I'm not entirely convinced that either of the above
changes really have a lot to offer bpf in terms of performance.
I currently don't see bpfmmap having much to offer, with the current
buffer strategy, in terms of performance, unless the cost of moving
bytes out of the kernel with uiomove is large in comparison to
everything else.  To achieve the same level of buffering in the kernel
will require 3*X (rather than 2*X) memory is allocated for each bpf
device when it is opened - or should I say configured for mmap use.
I could be more convinced that allowing bpfread to get 2*buffersize is
worthwhile but there does only appear to be one small area where this
offers a saving.  At this point in time, I've done no measuring of
implementation changes.

Comments ?
Darren

*** bpf.c	Sat Apr 24 12:43:37 2004
--- bpf.c.new	Sun May  2 10:17:59 2004
***************
*** 435,440 ****
--- 435,441 ----
  	int ioflag;
  {
  	struct bpf_d *d = &bpf_dtab[minor(dev)];
+ 	int dosbuftoo;
  	int timed_out;
  	int error;
  	int s;
***************
*** 443,449 ****
  	 * Restrict application to use a buffer the same size as
  	 * as kernel buffers.
  	 */
! 	if (uio->uio_resid != d->bd_bufsize)
  		return (EINVAL);
  
  	s = splnet();
--- 444,454 ----
  	 * Restrict application to use a buffer the same size as
  	 * as kernel buffers.
  	 */
! 	if (uio->uio_resid == d->bd_bufsize *2)
! 		dosbuftoo = 1;
! 	else if (uio->uio_resid == d->bd_bufsize)
! 		dosbuftoo = 0;
! 	else
  		return (EINVAL);
  
  	s = splnet();
***************
*** 516,523 ****
  	 * we checked above that the read buffer is bpf_bufsize bytes.
  	 */
  	error = uiomove(d->bd_hbuf, d->bd_hlen, uio);
  
- 	s = splnet();
  	d->bd_fbuf = d->bd_hbuf;
  	d->bd_hbuf = 0;
  	d->bd_hlen = 0;
--- 521,537 ----
  	 * we checked above that the read buffer is bpf_bufsize bytes.
  	 */
  	error = uiomove(d->bd_hbuf, d->bd_hlen, uio);
+ 	if (error == 0 && dosbuftoo) {
+ 		s = splnet();
+ 		if (d->bd_slen != 0) {
+ 			ROTATE_BUFFERS(d);
+ 			splx(s);
+ 			error = uiomove(d->bd_hbuf, d->bd_hlen, uio);
+ 			s = splnet();
+ 		}
+ 	} else
+ 		s = splnet();
  
  	d->bd_fbuf = d->bd_hbuf;
  	d->bd_hbuf = 0;
  	d->bd_hlen = 0;