tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: mbuf cluster leak?



> WARNING: mclpool limit reached; increase kern.mbuf.nmbclusters
Welcome to the club.

I identified and fixed so many (potential) mbuf cluster leaks that I probably 
forgot about most of them. You may browse through tech-net if you like.

The biggest one is in lltable_drop_entry_queue(), which is completely broken 
since no-one ever increments la_numheld. Patch (to net/llatbl.c) attached.

The next largely suspicious code is in m_ext_free(), which I have mostly
rewritten (see attached). For that version to work, you need (or so I think)
	(n)->m_ext_storage.ext_refcnt = 0;				\
after
	(n)->m_flags |= ((o)->m_flags & M_EXTCOPYFLAGS);		\
in sys/kern/uipc_mbuf.c

I have those patches for -8 too in case that's relevant to you.

I think there are other (potential) leaks in IPF, but that can't be your 
problem.

I also have (posted to tech-net) a (hacky) utility to dump kernel pools 
(so you can analyze what's in those orphaned mbuf clusters).
And I have a large patchset adding intrumentation to the mbuf life cycle 
(by writing magic values to m_pkthdr.pad0) so I could (at least for the code 
paths relevant to me) see where an mbuf was last used.
As far as I remember, combining both let me find the ls_numheld bug.

Good luck!
Index: sys/net/if_llatbl.c
===================================================================
RCS file: /cvsroot/src/sys/net/if_llatbl.c,v
retrieving revision 1.35
diff -u -r1.35 if_llatbl.c
--- sys/net/if_llatbl.c	19 Nov 2022 08:00:51 -0000	1.35
+++ sys/net/if_llatbl.c	19 Feb 2025 14:44:39 -0000
@@ -322,6 +322,8 @@
 	LLE_WLOCK_ASSERT(lle);
 
 	pkts_dropped = 0;
+
+#if 0 /* XXX la_numheld isn't (properly) incremented */
 	while ((lle->la_numheld > 0) && (lle->la_hold != NULL)) {
 		next = lle->la_hold->m_nextpkt;
 		m_freem(lle->la_hold);
@@ -333,6 +335,15 @@
 	KASSERTMSG(lle->la_numheld == 0,
 		"la_numheld %d > 0, pkts_dropped %zd",
 		 lle->la_numheld, pkts_dropped);
+#else
+	while (lle->la_hold != NULL) {
+		next = lle->la_hold->m_nextpkt;
+		m_freem(lle->la_hold);
+		lle->la_hold = next;
+		pkts_dropped++;
+	}
+	lle->la_numheld = 0;
+#endif
 
 	return (pkts_dropped);
 }
static void
m_ext_free(struct mbuf *m)
{
	struct mbuf *m1 = m->m_ext_ref; /* may well be m itself */
	u_int refcnt;

	KASSERT((m->m_flags & M_EXT) != 0);
	KASSERT(MEXT_ISEMBEDDED(m1));
	KASSERT((m1->m_flags & M_EXT) != 0);
	KASSERT((m->m_flags & M_EXT_CLUSTER) ==
	    (m1->m_flags & M_EXT_CLUSTER));
	if (__predict_false(m->m_type == MT_FREE)) {
		panic("mbuf %p already freed", m);
	}

	if (__predict_true(m1->m_ext_storage.ext_refcnt == 1)) {
		refcnt = m1->m_ext_storage.ext_refcnt = 0;
	} else {
#ifndef __HAVE_ATOMIC_AS_MEMBAR
		membar_release();
#endif
		refcnt = atomic_dec_uint_nv(&m1->m_ext_storage.ext_refcnt);
	}
	if (refcnt == 0) {
#ifndef __HAVE_ATOMIC_AS_MEMBAR
		membar_acquire();
#endif
		if ((m1->m_flags & M_EXT_CLUSTER) != 0) {
			pool_cache_put_paddr(mcl_cache,
			    m1->m_ext_storage.ext_buf, m1->m_ext_storage.ext_paddr);
			m1->m_type = MT_FREE;
			pool_cache_put(mb_cache, m1);
		} else if (m1->m_ext_storage.ext_free) {
			(*m1->m_ext_storage.ext_free)(m1,
			    m1->m_ext_storage.ext_buf, m1->m_ext_storage.ext_size,
			    m1->m_ext_storage.ext_arg);
			/*
			 * m1 is already freed by the ext_free callback.
			 */
		} else {
			free(m1->m_ext_storage.ext_buf, 0);
			m1->m_type = MT_FREE;
			pool_cache_put(mb_cache, m1);
		}
	}
	if (__predict_false(m1 != m)) {
		KASSERT(m->m_ext_storage.ext_refcnt == 0);
		m->m_type = MT_FREE;
		m->m_data = NULL;
		pool_cache_put(mb_cache, m);
	}
}


Home | Main Index | Thread Index | Old Index