Subject: status of the mbuf API SoC project
To: None <>
From: Pavel Cahyna <>
List: tech-net
Date: 09/03/2006 23:21:02

we should think about merging the results of my SoC project. I'll give a
brief overview here.

I've added two macros to replace mtod: mptr and mptr_rw. Their
"prototypes" are:

const datatype *
mptr(struct mbuf **mp, datatype, int off, int flags);

datatype *
mptr_rw(struct mbuf **mp, datatype, int off, int flags);

differences from mtod:

- mtod accepts the type of the resulting pointer, mptr* accept the type
  that is being pointed to. This is to be able to apply sizeof() to the
  type inside the macros, and in the case of mptr, to add const.

- the new macros accept an offset, to access structures which are not at
  the beginning of the packet.

- struct mbuf *m was changed to struct mbuf **mp, enabling those macros to
  rearrange the mbuf chain.

The change in semantics is that the macros explicitely check if the
pointer is valid, that is, if there is at least sizeof(datatype)
contiguous data in the mbuf. In the case of mptr, the writability of the
mbuf is also checked.

What happens if the check fails depends on the value of flags. If they
include MP_PULLUP, the chain is rearranged to meet those requirements.
This way, mptr can replace the calls to m_pullup or m_pulldown.

Without MP_PULLUP, such calls trigger a panic.

Sometimes, passing the size implicitely through sizeof is inconvenient, so
I've made function variants m_datarange and m_datarange_rw which accept an
explicit size:

const void *
m_datarange(struct mbuf **mp, int len, int off, int flags);

void *
m_datarange_rw(struct mbuf **mp, int len, int off, int flags);

In those functions, MP_PULLUP is implied. They also accept more flags so
those function can replace the current m_copyup.

Please see the manual page at for details.

I have converted all the net, netinet, netinet6 and net80211 to the new
functions where suitable (this resulted in a lot of constification). The
result is that the input path should be now read-only mbuf safe (read-only
mbufs were causing a lot of problems before). I've checked it with a patch
to Xen provided by bouyer@ which maps mbuf external storage read-only.

A possibly controversial change was done to the IP and IPv6 input
routines. They now use m_datarange to make a large (512 bytes) contiguous
region, which should be enough for all the headers. Then the
implementation of upper-layer protocols do not need to to worry about
contiguous mbufs. A problem could arise when the IP header is added in the
output path in a separate mbuf (for example because the data are shared),
and then the packet is reflected back by the loopback device. In this
case, we have to copy much more data than usually needed to make those 512
bytes contiguous with no good use. I did not benchmark this yet, but I
guess it will have a negative impact on performance.

Besides that, another subject for discussion is the naming and arguments
of the proposed macros and functions. The arguments of m_datarange were
modelled after the arguments of mptr, with datatype replaced by the
requested size. This seemed natural. But after that I realized that other
mbuf routines have "off" and "len" arguments reversed (see m_copydata for
an example) so I should probably swap those arguments for m_datarange too,
and swap "datatype" with "off" in mptr to match.

Please have a look at if
you are interested.