Subject: WARNING: m_pulldown considered harmful
To: None <tech-net@netbsd.org>
From: Darren Reed <avalon@caligula.anu.edu.au>
List: tech-net
Date: 04/29/2004 21:55:54
So some of the problems with ipfilter currently being observed are that
it tries to do m_pullup(m, packetsize) on some packets.  Which is fine
for packetsize<=MHLEN.  In order to do packetsize>=MHLEN, another API
call is required.  In a quick look, there is a number of mbuf functions
here that all do approximately the same thing, but not quite.  The two
I'm going to rant about in this email are:
m_pullup
m_pulldown

(and there are other related ones but lets concentrate on these two
 for now...)

m_pullup() is used in lots of places and mostly "works" and is
a nice and clean interface.  However, m_pulldown() is antoher
metter.  With "int *offp" as the last parameter in m_pulldown(),
this seems like a badly designed interface.  Lets have a quick
look at one use of it:
                t = m_pulldown((m), (off), (len), &tmp);
                if (t) {
                        if (t->m_len < tmp + (len))
                                panic("m_pulldown malfunction");
                        (val) = (typ)(mtod(t, caddr_t) + tmp);
                } else {
                        (val) = (typ)NULL;
                        (m) = NULL;
                }
(trailing \'s removed)

WARNING: use of m_pulldown() considered harmful.

This is disgusting.  m_pulldown() is returning an mbuf for which
we don't get the correct pointer via use of mtod().  m_pulldown()
should be leaving the mbuf in a consistant state, like other
interfaces to mbuf's do, where mtod(t, caddr_t) would return the
correct result.  That there's a panic message here is further
evidence of bad design - this panic message, like other mbuf
related ones, should be generated by m_pulldown, if at all.
This is just one instance of its use - almost every other use has
a mtod() with a "+ tmp" because of this garbage.  (Sorry, I can't
be nice about this, it really is trash if no other part of the
mbuf API works like this.)

Using m_pulldown() anywhere that is follwed by code that expects
to be able to use mtod() sanely will cause bad things to happen.

So what would I like to see happen ?

Personally, I think if len>MHLEN, m_pullup should create a
clustered mbuf and return that.  *I* shouldn't have to care
about what type the mbuf currently is, that's the business of
the internals of the mbuf API.  If I want to put more in the
current buffer than it can handle then it should just create
a new one big enough and put it in there.  If I get a NULL
return, I want that to mean that there isn't enough kernel
memory, not because MHLEN is too small.  If this model breaks
things then m_pulldown() needs to be fixed.  Fixed such that
mbuf's returned from it are safe for code anywhere in the
kernel to do mtod() on a returned mbuf and get the pointer
they expect to get (and not require some magic offset value
to be available.)  I can't see that being a problem for any
code that currently uses m_pulldown() (in fact it would more
than likely improve a lot of related code everywhere.)

btw, another disgusting problem with m_pulldown is you see
lots of code like this:
		n = m_pulldown(m, off, l, &o);
		comb = (struct sadb_comb *)(mtod(n, caddr_t) + o);
mtod being used with caddr_t so that 'o' can be added to it rather
than this:
		n = m_pulldown(m, off, l);
		comb = mtod(n, struct sadb_comb *);

Maybe this problem is as easily fixed by adding:
	n->m_data += off
at the end of m_pulldown() (followed by fixing all uses of
m_pulldown shortly thereafter) and removing offp but I can't
be sure that's safe or right, just now.

In summary, there appears to currently be no safe way in the
current mbuf API to do m_pullup(m,len) where len > MHLEN and
have that work throughout the kernel without risk.  I'm just
as reluctant to add something else because (a) it shouldn't
be needed (b) it'll duplicate a lot of code already there and
(c) it'll bloat the API needlessly.

God I wish I'd found this before we decided to branch for 2.0.

Darren

p.s. FWIW, Solaris has two functions to do this: pullupmsg()
and msgpullup().  Both take two args: a pointer to the mbuf
equivalent and a length - it's up to calling code to manage
pointers if some 'offset' is important.