Subject: mbuf pros and cons [was Re: mtod abuse?]
To: mouss <usebsd@free.fr>
From: Pavel Cahyna <pcah8322@artax.karlin.mff.cuni.cz>
List: tech-net
Date: 08/09/2004 13:53:26
> "It is improbable that the link layer would split even the largest ... 
> IP header into two mbufs ...." [R.W. Stevens]

Well, I was not talking about probability, but possibility. My question
was, "is this code buggy?"

> 
> There is no reason for any layer but those that do encapsulation or 
> decapsulation (such as ipsec) to store layer headers in multiple mbufs. 
> so ethernet+IP+TCP headers should all be in one mbuf. If this becomes 
> untrue, then the whole mbuf design becomes a bit suboptimal.

Suboptimal, but still working if used correctly.

> As far as I can tell, the only justification for the so-complex mbuf 
> system (which helped me discover the panic world when I was "virgin") is 
> that:
>  - layers can add/remove headers
>  - in the usual case, there is no need for mbuf chains (i.e. major 
> headers that need to be updated are in one mbuf. This include L2, IP and 
> transport headers. Of course, later came IPSec...)
> 
> That said, I'm not sure the mbuf design is still justified, as the 
> encapsulation things (ipsec for instance) break the "static packet size" 
> model, so there is no way to ensure "major" headers will be in the first 
> mbuf. Also, if you consider that any filtering task (such as ipf code 

Isn't this an argument for the mbuf system? Because it will work OK even
in such cases. If all the headers were guaranteed to be in the first
mbuf, why would the mbuf system be needed?

> for instance) will need to mpull to get a header, and that this is done 
> multiple times, one gets to ask why isn't an mpull done at start (and 
> redone whenever the packet is encap/decapsulated) to make sure other 
> pieces of code won't need to check and mpull. I did so in an encap/decap 
> (...)

So, your complaint is that m_pullup must be done many times and most of
the calls are unneeded?

> BTW, does anyone have a comparison of mbufs with solaris mblocks and 
> linux skbufs (both in terms of perf and usability)?

There is a short comparison in Czech here:
http://www.root.cz/clanek/2060

Briefly, the skbuf is a single memory area which has enough space before
it to be able to prepend any headers, while data in mbufs can be
fragmented - in multiple mbufs chained together. This makes implementation
of zero-copy send easy and clean - you allocate a mbuf which points to
user data and headers are prepended to it in separate mbufs. To get
zero-copy send, Linux skbufs had to be somehow hacked.

Unfortunately, there is no comparison of speed. But I believe all the
benefits of skbufs can be obtained with mbufs by allocating sufficiently
large mbufs. Is that right?