Recently I've talked with a few different folks about packet capture and have become aware of some of the problems that people face when trying to use BPF vs other propritary solutions that exist. While it may be possible to capture data at a good rate with BPF, there is important meta data that isn't provided. This set of diffs attempts to address that by introducing a new BPF record format that the kernel may make. At present this is enabled by issuing an ioctl that could effectively be turned into something that "tuned" the data format provided to applications. The other way I thought of providing a different format was to create a /dev/ebpf, but that meant a whole lot more trouble. I haven't yet set to the task of modifying libpcap to take advantage of the new format - that's a future task - and it is somewhat likely that I may have missed a bpf_tap, hidden away. In looking through the kernel, some calls to bpf_tap/bpf_mtap hide behind macros - do people know why? But this seems to vaguely work so I'm interested in feedback. To make things easier, I'm including the proposed new record header below. The change and choices for time and lengths should be obvious. The purpose of the sequence number is to provide the rolling counter of the packets captured for the one in question. Thus if in successive reads the count went from 2 to 5, you know 3 packets have been missed. Providing the record length means when walking through the data read, you know how far to jump from the start of one to the next - exactly - without having to play word-size rounding games and similarly, the use of pktoff means that if the record header were to grow, it is still easy to find the start of the packet data (in a backwards-compatible fashion.) Something that I have considered is whether or not to provide an additional flags uint32_t to be a copy of the csum_flags from the mbuf. Being able to echo this information up to tcpdump would mean that it could be used to do more intelligent things with checksums that aren't complete. Why not just try to merge that with some other set of flags? Because the packet flags are very platform specific and it would be craziness to try and merge application flags with them. Thoughts? Darren (and no, I haven't bothered tcpdump-workers about this yet - most of the significant work on *BPF* seems to happen anywhere but there, now.) /* * Enhanced BPF packet record structure */ typedef struct ebpf_rec_s { uint64_t ebr_secs; /* No more Y2k38 problem */ uint32_t ebr_nsecs; uint32_t ebr_seqno; /* sequence number in capture */ uint32_t ebr_flags; uint32_t ebr_rlen; /* 16 bits is not enough for IPv6 */ uint32_t ebr_wlen; /* Jumbograms, so we have to use */ uint32_t ebr_clen; /* 32 bits to represent all lengths */ uint32_t ebr_pktoff; uint16_t ebr_type; /* DLT_* type */ uint16_t ebr_subtype; } ebpf_rec_t; /* * rlen = total record length (header + packet) * wlen = wire length of packet * clen = captured length of packet * pktoff = offset from ebr_secs to the start of the packet data (may not be * the same as sizeof(ebr_rec_t)) * * flags are asa below: */ #define EBPF_OUT 0x00000001 /* Transmitted packet */
Attachment:
cvs-u.diffs.bz2
Description: Binary data