tech-net: Re: mbuf external storage sharing

Subject: Re: mbuf external storage sharing
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 10/07/2004 16:05:49

getting deglIn message <1096940580.134154.15681.nullmailer@yamt.dyndns.org>,
YAMAMOTO Takashi  writes:

>> Yamamoto-san, there seem to be two more-or-less unrelated changes in this pa
>tch:

Yamamoto-san,

First, My apologies for  my delay in responding to your message.

[...]
>> For it seems to contradict the
>> conclusions in the FreeBSD-5 paper I cited earlier.
>
>how contradict?
>allocating the little header is what exactly freebsd5 does.
>only difference is, freebsd5's header contains only reference count.

Going by Bosko Milekic's paper [I need to reread more recent source
code]: FreeBSD-5 uses m_clget() to atomically allocate one physically-noncontigous little header and one external mbuf (cluster mbuf).  

Both the little mbuf header (or even a tiny header-only mbuf, in other
approaches) and the cluster are allocated from a per-cpu bucket.
Also, each CPU has to disable recursive network interrupts in any
case.  Thus, if one designs the data structures carefully, one need
not acquire spinlocks on the mbuf-cluster pool [sic] or the
little-mbuf pool [sic]: there are no races between the CPU and itself,
so need for spinlocks.

[...]

>> >not relevant to my patch.
>> 
>> Yes and no: [....]

>be more specific, please.
>you can easily modify pool_cache to use per-cpu cache.
>how is it related to my patch?

I would prefer an approach whereby the per-cpu caches are race-free,
and thus the common case (i.e., atomically allocating both an mbuf
header and a cluster, satisifed from the per-cpu pool) need not
acquire per-mbuf spinlocks at all.  

That approach also generaalizes well to other allocations than for NIC
device-input DMA queues. (e.g.,sosend() in the non-loanout case?),
whereas I'm skeptical the approach in your patch will be applicable.

Last, I do appreciate your patch, and the thought that has gone into
it.  I fully support an m_clget()-stype interface, for an atomic "grab
a hdr-mbuf plus cluster" atomically, with efficient locking of the
joint operation.  I support making that operation MP-safe.  I just
happen to think we can do slightly better and more genarlly than in
your proposed patch.