tech-net: mmap(2) interface for bpf

Subject: mmap(2) interface for bpf
To: None <tech-net@netbsd.org>
From: Darren Reed <avalon@caligula.anu.edu.au>
List: tech-net
Date: 05/21/2004 19:22:24

Thinking about the white paper recently circulated about packet
capture performance, I made some changes to bpf to implement a
mmap interface.  The results of a quick test don't show mind
blowing performance benefits (and you would expect something is
wrong if they did given bpf's archtecture.)  The implementation
to using bpf has thus far only been to support packet capture
to bpf, not packet sending, with the bpf device only supporting
being mapped read-only.

Writing a simple program to read ~10MB from /dev/bpf? in 1MB chunks
and then write that to /dev/null, traditional read/write took a total
time of 2.46 and mmap/write 2.40 seconds.  The most notable drop was
in the amount of system time - from an average of .04s to an average
of 0.0.

So why would you use mmap vs read ?

If you're measuring CPU cycles and want as much as possible available
to do the processing of the capture data in your application.  There
is also more potential for packets being dropped using mmap than read.

The patches I put together to do this are almost the beginnings of
what you might do if you were going to use an explicit ringer buffer
for this, rather than just a hold/active-store pair (ring size of 2).

This path is possibly worth pursuing further.  For example, rather
than having 2*10MB buffers (hold/store), you might have 10*2 MB,
the difference being only 1 (2MB) at a time is locked into being the
"hold" buffer and the other 18MB is available for buffering.

A copy of the diffs and the quick-n-dirty test programs can be had at:
http://coombs.anu.edu.au/~avalon/bpf.2.tar

For such a small performance gain, is this worth committing ?
Or is the scenario I've outlined for testing a poor sample of
the real benefit/savings ?  Or even, is it a "best-case" scenario
and others show even less benefit ?

Darren