Subject: Re: mtod abuse?
To: Pavel Cahyna <firstname.lastname@example.org>
From: mouss <email@example.com>
Date: 08/08/2004 01:04:33
"It is improbable that the link layer would split even the largest ...
IP header into two mbufs ...." [R.W. Stevens]
There is no reason for any layer but those that do encapsulation or
decapsulation (such as ipsec) to store layer headers in multiple mbufs.
so ethernet+IP+TCP headers should all be in one mbuf. If this becomes
untrue, then the whole mbuf design becomes a bit suboptimal.
As far as I can tell, the only justification for the so-complex mbuf
system (which helped me discover the panic world when I was "virgin") is
- layers can add/remove headers
- in the usual case, there is no need for mbuf chains (i.e. major
headers that need to be updated are in one mbuf. This include L2, IP and
transport headers. Of course, later came IPSec...)
That said, I'm not sure the mbuf design is still justified, as the
encapsulation things (ipsec for instance) break the "static packet size"
model, so there is no way to ensure "major" headers will be in the first
mbuf. Also, if you consider that any filtering task (such as ipf code
for instance) will need to mpull to get a header, and that this is done
multiple times, one gets to ask why isn't an mpull done at start (and
redone whenever the packet is encap/decapsulated) to make sure other
pieces of code won't need to check and mpull. I did so in an encap/decap
(not for security though) implementation and it helped remove a lot of
checks (well that implementation only dealt with TCP and UDP packets,
but given that most packets are TCP or UDP, it's better to improve the
"usual" case). so as part of PFIL hooks, I'd see an mpullup and require
that those filters that change packets guarantee that the IP and
transport header be in the first mbuf. this way, most filters won't need
to check and pullup. Of course at the time the IP stack was written,
there was no reason to check tcp headers in the IP level functions. but
since then, ip filters and other stuff came in. I now see no problem
including tcp.h and friends in ip_input.c for instance...
BTW, does anyone have a comparison of mbufs with solaris mblocks and
linux skbufs (both in terms of perf and usability)?