tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Improving the data supplied by BPF

Recently I've talked with a few different folks about packet capture
and have become aware of some of the problems that people face when
trying to use BPF vs other propritary solutions that exist. While it
may be possible to capture data at a good rate with BPF, there is
important meta data that isn't provided.

This set of diffs attempts to address that by introducing a new BPF
record format that the kernel may make. At present this is enabled
by issuing an ioctl that could effectively be turned into something
that "tuned" the data format provided to applications. The other way
I thought of providing a different format was to create a /dev/ebpf,
but that meant a whole lot more trouble.

I haven't yet set to the task of modifying libpcap to take advantage
of the new format - that's a future task - and it is somewhat likely
that I may have missed a bpf_tap, hidden away. In looking through the
kernel, some calls to bpf_tap/bpf_mtap hide behind macros - do people
know why?

But this seems to vaguely work so I'm interested in feedback.

To make things easier, I'm including the proposed new record header
below. The change and choices for time and lengths should be obvious.

The purpose of the sequence number is to provide the rolling counter
of the packets captured for the one in question. Thus if in successive
reads the count went from 2 to 5, you know 3 packets have been missed.
Providing the record length means when walking through the data read,
you know how far to jump from the start of one to the next - exactly -
without having to play word-size rounding games and similarly, the use
of pktoff means that if the record header were to grow, it is still
easy to find the start of the packet data (in a backwards-compatible

Something that I have considered is whether or not to provide an
additional flags uint32_t to be a copy of the csum_flags from the
mbuf. Being able to echo this information up to tcpdump would mean
that it could be used to do more intelligent things with checksums
that aren't complete. Why not just try to merge that with some
other set of flags? Because the packet flags are very platform
specific and it would be craziness to try and merge application
flags with them.


(and no, I haven't bothered tcpdump-workers about this yet - most
 of the significant work on *BPF* seems to happen anywhere but
 there, now.)

 * Enhanced BPF packet record structure
typedef struct ebpf_rec_s {
       uint64_t        ebr_secs;       /* No more Y2k38 problem */
       uint32_t        ebr_nsecs;
       uint32_t        ebr_seqno;      /* sequence number in capture */
       uint32_t        ebr_flags;
       uint32_t        ebr_rlen;       /* 16 bits is not enough for
IPv6   */
       uint32_t        ebr_wlen;       /* Jumbograms, so we have to
use    */
       uint32_t        ebr_clen;       /* 32 bits to represent all
lengths */
       uint32_t        ebr_pktoff;
       uint16_t        ebr_type;       /* DLT_* type */
       uint16_t        ebr_subtype;
} ebpf_rec_t;

 * rlen = total record length (header + packet)
 * wlen = wire length of packet
 * clen = captured length of packet
 * pktoff = offset from ebr_secs to the start of the packet data (may not be
 *          the same as sizeof(ebr_rec_t))
 * flags are asa below:
#define        EBPF_OUT                0x00000001      /* Transmitted
packet */

Attachment: cvs-u.diffs.bz2
Description: Binary data

Home | Main Index | Thread Index | Old Index