Subject: Re: mtod abuse?
To: Pavel Cahyna <pavel.cahyna@st.mff.cuni.cz>
From: mouss <usebsd@free.fr>
List: tech-net
Date: 08/08/2004 01:04:33
"It is improbable that the link layer would split even the largest ... 
IP header into two mbufs ...." [R.W. Stevens]

There is no reason for any layer but those that do encapsulation or 
decapsulation (such as ipsec) to store layer headers in multiple mbufs. 
so ethernet+IP+TCP headers should all be in one mbuf. If this becomes 
untrue, then the whole mbuf design becomes a bit suboptimal.
As far as I can tell, the only justification for the so-complex mbuf 
system (which helped me discover the panic world when I was "virgin") is 
that:
  - layers can add/remove headers
  - in the usual case, there is no need for mbuf chains (i.e. major 
headers that need to be updated are in one mbuf. This include L2, IP and 
transport headers. Of course, later came IPSec...)

That said, I'm not sure the mbuf design is still justified, as the 
encapsulation things (ipsec for instance) break the "static packet size" 
model, so there is no way to ensure "major" headers will be in the first 
mbuf. Also, if you consider that any filtering task (such as ipf code 
for instance) will need to mpull to get a header, and that this is done 
multiple times, one gets to ask why isn't an mpull done at start (and 
redone whenever the packet is encap/decapsulated) to make sure other 
pieces of code won't need to check and mpull. I did so in an encap/decap 
(not for security though) implementation and it helped remove a lot of 
checks (well that implementation only dealt with TCP and UDP packets, 
but given that most packets are TCP or UDP, it's better to improve the 
"usual" case). so as part of PFIL hooks, I'd see an mpullup and require 
that those filters that change packets guarantee that the IP and 
transport header be in the first mbuf. this way, most filters won't need 
to check and pullup. Of course at the time the IP stack was written, 
there was no reason to check tcp headers in the IP level functions. but 
since then, ip filters and other stuff came in. I now see no problem 
including tcp.h and friends in ip_input.c for instance...

BTW, does anyone have a comparison of mbufs with solaris mblocks and 
linux skbufs (both in terms of perf and usability)?