tech-net: Re: mbuf external storage sharing

Subject: Re: mbuf external storage sharing
To: None <jonathan@dsg.stanford.edu>
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
List: tech-net
Date: 11/10/2004 07:28:28
hi,

> >> I repeat my own question: what `mapping state' do you need to change,
> >
> >state of lazy mapping of loaned pages.  pages're initially unmapped
> >and then mapped into kernel address space when/if needed.
> >at that point, the storage might be shared by mbufs.
> 
> To me, that seems a very *very* minor consideration, compared to
> allocation of cluster+header for receive buffers, and setting up those
> buffers for receive DMA.  We _cannot_, repeat _CANNOT let those
> receive buffers cross page boundaries, or wc will (a) suffer gross
> ineffiiciencies on some plaforms, or (b) tickle DMA bugs with some
> NICs.

it might be minor for you.  but it isn't for me.

> Given that, I don't see offhand how you can allocate
> phsyically-contiguous cluster+small headers, without wasting
> unacceptably large amounts of KVA space.

they don't need to be physically contiguous.  see below.

> >> and exactly how does placing small mbufs physically adjacent to the
> >> large external mbufs help you achieve this?
> >
> >sorry, i don't understand what you're saying here.
> 
> I dont know how else to say it. The way I read your proposed patch,
> you want to allocate two objects: a small `header' and an external
> mbuf, at adjacent addresses in KVA space.  To me, that seems
> self-evidently a bad idea. So I'm asking, just what does the
> KVA-adjacency of the small header and its associated external mbuf
> help you to achieve?

are you talking about M_EXT_HDREMBEDDED in my patch?
if so, it isn't the fundamental point of the patch.
i added it just because i don't wanted to allocate two objects.

> >> As I beleive I said repeatedly: that is not what Milecik's BSDcon
> >> paper appears to say.  (To me, the paper seems to say there is a
> >> single `keg' which can allocate external storage or header mbufs; with
> >> enough space reserved in the zone headers to implement per-slab-object
> >> refcounts.
> >
> >these refcounts are "small headers" I said above.
> 
> Are you *absolutely* sure about that?  That's not what the paper
> appears to say.

i'm not sure what you're claiming.
they're small headers because they're headers and they're small.
and, of course, i'm absolutely sure what i meant with "small headers".

> According to the paper,they are solely refcounts, in
> space left aside by the FreeBSD UMA/zone allocators, and exposed via a
> refcount-only API as refcounts for suitable external mbufs.

for clusters, yes.

> >> Those refcounts are then used to implement
> >> per-external-storage refcounts. The FreeBSD `m_getcl() API atomically
> >> then allocates a header mbuf and a cluster; the per-CPu caches return
> >> suitable objects. But I could be misreading it.)
> >
> >it seems that you're confusing between clusters (m_getcl) and
> >external storage (m_extadd).
> 
> Nope.  I am reading the BSDcon paper and looking at FreeBSD-5 source
> code via cvsweb.

then, you know m_extadd normally doesn't use the refcounts provided by uma?

> >> If, on the other hand, FreeBSD-5 is fragmenting the memory backing
> >> external mbuf storage, then that strikes me as a poor design
> >> choice. But it strikes me as a much, much worse choice for NetBSD, and
> >> the wider range of CPUs we run on.
> >
> >then, do you have better alternative?
> 
> I'd say, if something is a bad idea, don't do it.  That's why I keep
> asking: just ywhat do you percieve as the upside of KVA-adjacent
> `small' headers?  Because to me it sounds like a bad idea.
> 
> Can you perhaps draw a diagram of, say, two 4k pages (convenient for
> i386), plus any associated header storage, and then explain how (under
> your envisioned changes) how these pages and the associated mbuf
> storage, would be carved up into external mbuf storage with
> KVA-adjacent small headers, and any regular headers? What does the
> picture look like for a NIC driver using external storage (say, an
> mbuf chain of MCLBYTE-clusters?)  to implement jumbo frames?

again, "kva-adjacent" is not my point.

my point is, in order to implement lazy-mapping of
external storages in somewhat mp-safe way, it's better to really share
the some portion of mbuf header (unlike the current linked-list method).
the patch i proposed is a way to achieve it.
another way might be to directly-share the part of the original mbuf header
as i said in one of the previous mails.  (i have a rough patch.
do you want to see?)

> I would really like to see what you want to do, and thus to see just
> where the KVA-adjacent "tiny headers" help you. Because to me, so far,
> they sound like an unmitigated disaster for the applications I care
> about (e.g., high-throughput NFS).

i think nfsd is one of subsystems which can gain the most from
lazy-mapping.  currently, every time when it sends out
data from page cache, it needs to map them into kva.
if you really care about nfs performance, i don't see why you think
it's a "very *very* minor consideration".

YAMAMOTO Takashi