tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

m_ext_free() flawed



I'm more and more convinced m_ext_free() (from yamt@'s lazy-mbuf branch) is 
flawed. It appears to me there's confusion (I myself ran into) because m_ext 
is m_ext_ref->m_ext_storage, there are two
	m->m_ext_ref = m;
assignements I don't understand and a
	m->m_ext.ext_refcnt++; /* XXX */
that looks even weirder and it recursively calls itself.

My conclusion after several weeks of investigation and debugging is that 
the logic should be:

First, deal with the mbuf m->m_ext_ref points to (which may well be m itself): 
decrement the reference count, if it drops to zero, discard both the external 
storage and that mbuf itself.

Second, in case the mbuf dealt with in the first step is not m itself, 
unconditionally discard m: since m->m_ext_ref doesn't point to m, no other 
mbuf's m_ext_ref should point to m. KASSERT() that m's reference count is 
zero (set in MCLADDREFERENCE()).

So, apart from a change to MCLADDREFERENCE() that is, strictly speaking, only 
for clarity (and needed for the KASSERT()):

--- kern/uipc_mbuf.c	25 Oct 2021 15:49:49 -0000	1.172.6.6
+++ kern/uipc_mbuf.c	5 Apr 2024 12:23:59 -0000
@@ -142,6 +142,7 @@
 	KASSERT(((n)->m_flags & M_EXT) == 0);				\
 	KASSERT((o)->m_ext.ext_refcnt >= 1);				\
 	(n)->m_flags |= ((o)->m_flags & M_EXTCOPYFLAGS);		\
+	(n)->m_ext_storage.ext_refcnt = 0;				\
 	atomic_inc_uint(&(o)->m_ext.ext_refcnt);			\
 	(n)->m_ext_ref = (o)->m_ext_ref;				\
 	mowner_ref((n), (n)->m_flags);					\

I now have this m_ext_free():

void
m_ext_free(struct mbuf *m)
{
	struct mbuf *m1 = m->m_ext_ref; /* may well be m itself */
	u_int refcnt;

	KASSERT((m->m_flags & M_EXT) != 0);
	KASSERT(MEXT_ISEMBEDDED(m1));
	KASSERT((m1->m_flags & M_EXT) != 0);
	KASSERT((m->m_flags & M_EXT_CLUSTER) ==
	    (m1->m_flags & M_EXT_CLUSTER));

	if (__predict_true(m1->m_ext_storage.ext_refcnt == 1)) {
		refcnt = m1->m_ext_storage.ext_refcnt = 0;
	} else {
		refcnt = atomic_dec_uint_nv(&m1->m_ext_storage.ext_refcnt);
	}
	if (refcnt == 0) {
		if ((m1->m_flags & M_EXT_CLUSTER) != 0) {
			pool_cache_put_paddr((struct pool_cache *)
			    m1->m_ext_storage.ext_arg,
			    m1->m_ext_storage.ext_buf, m1->m_ext_storage.ext_paddr);
			m1->m_type = MT_FREE;
			pool_cache_put(mb_cache, m1);
		} else if (m1->m_ext_storage.ext_free) {
			(*m1->m_ext_storage.ext_free)(m1,
			    m1->m_ext_storage.ext_buf, m1->m_ext_storage.ext_size,
			    m1->m_ext_storage.ext_arg);
			/*
			 * m1 is already freed by the ext_free callback.
			 */
		} else {
			free(m1->m_ext_storage.ext_buf, m1->m_ext_storage.ext_type);
			m1->m_type = MT_FREE;
			pool_cache_put(mb_cache, m1);
		}
	}
	if (__predict_false(m1 != m)) {
		KASSERT(m->m_ext_storage.ext_refcnt == 0);
		m->m_type = MT_FREE;
		pool_cache_put(mb_cache, m);
	}
}

I'm now running this in production on our gateway, and it /appears/ to solve 
our mbuf leak problems.

Comments?


Home | Main Index | Thread Index | Old Index