tech-net: Re: mbuf external storage sharing

Subject: Re: mbuf external storage sharing
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 10/08/2004 08:42:15
In message <1097202573.157728.1552.nullmailer@yamt.dyndns.org>,
YAMAMOTO Takashi writes:




>(assuming that you mean m_getcl,)

Oops. I thought I was using the same API name as Bosko Milekic's
BSDcon paper on FreeBSD-5?


>as i wrote in the previous mail,
>the optimization, which might be good idea, does not contradict but
>is independent to what my patch does.

Sorry, but I sitll think that's incorrect.  The optimizations I
envisage, if done well, eliminates any need for, or benefit from,
the majority of your patch.


>i don't understand what you mean, sorry.
>it seems that you're confused between different topics.
>
>there're (at least) two topics you're mixing up:
>
>- how to share/unshare mbuf external storage
>	it's what my patch attmpts to change.
>
>	both of my patch and freebsd5 allocate a small header which
>	contains a reference count.  no much difference here, afaik.
>
>	to manipulate the reference count, freebsd5 uses atomic_ operations
>	and my patch uses spinlock.  i used spinlock just because we don't
>	have MI-exposed atomic_ operations.

But that's not really what I'm getting at.  Sure, to add a reference
to an existing external cluster, yes, you need to atomically increment
the refcount, and we can discuss the merits of spinlocks protecting an
increment, vs. explict atomic-increment :-/.

My point, however, is that when I look at the actual uses of your
fused ``atomically get an tiny-mbuf and external mbuf'', those uses
were in NIC drivers. Specifically, for acquiring new DMA buffers for
receive-side DMA queues.

In *that* case, if we allocate from per-CPU pools in the common case,
you don't need spinlocks or atomic increments *at all*, for the
initial allocation. And there are good, and very well-known reasons,
why having per-CPU pools is a Good Thing.




>	(it might be good to have MI-exposed atomic_ operations like
>	freebsd and linux.  but it's a separate topic.)
>
>- how to allocate/free mbufs and clusters
>	freebsd5 has some optimizations, including m_getcl.
>	although they might be good and i have no objections against them,
>	they're orthogonal to my patch.

You keep saying the issues are orthogonal.  To me, it seems quite
plain that they are not orthogonal. If we do per-CPU pools, then we
can do efficient SMP-safe allocation *without* fusing a tiny-mbuf
header into a smaller variant of a cluster mbuf. (If your goal is
space efficiency, there is prior art for better approaches.)

We are clearly talking at cross-purposes, but I still don't see where
or why.  Perhaps you could explain once more what the intended purpose
of your patch is?