Subject: Re: mbuf external storage sharing
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 11/09/2004 12:51:36
In message <1100031936.769999.2394.nullmailer@yamt.dyndns.org>,
YAMAMOTO Takashi writes:

>> I repeat my own question: what `mapping state' do you need to change,
>
>state of lazy mapping of loaned pages.  pages're initially unmapped
>and then mapped into kernel address space when/if needed.
>at that point, the storage might be shared by mbufs.

To me, that seems a very *very* minor consideration, compared to
allocation of cluster+header for receive buffers, and setting up those
buffers for receive DMA.  We _cannot_, repeat _CANNOT let those
receive buffers cross page boundaries, or wc will (a) suffer gross
ineffiiciencies on some plaforms, or (b) tickle DMA bugs with some
NICs.

Given that, I don't see offhand how you can allocate
phsyically-contiguous cluster+small headers, without wasting
unacceptably large amounts of KVA space.

>> and exactly how does placing small mbufs physically adjacent to the
>> large external mbufs help you achieve this?
>
>sorry, i don't understand what you're saying here.

I dont know how else to say it. The way I read your proposed patch,
you want to allocate two objects: a small `header' and an external
mbuf, at adjacent addresses in KVA space.  To me, that seems
self-evidently a bad idea. So I'm asking, just what does the
KVA-adjacency of the small header and its associated external mbuf
help you to achieve?

[...]

>> As I beleive I said repeatedly: that is not what Milecik's BSDcon
>> paper appears to say.  (To me, the paper seems to say there is a
>> single `keg' which can allocate external storage or header mbufs; with
>> enough space reserved in the zone headers to implement per-slab-object
>> refcounts.
>
>these refcounts are "small headers" I said above.

Are you *absolutely* sure about that?  That's not what the paper
appears to say. According to the paper,they are solely refcounts, in
space left aside by the FreeBSD UMA/zone allocators, and exposed via a
refcount-only API as refcounts for suitable external mbufs.



>> Those refcounts are then used to implement
>> per-external-storage refcounts. The FreeBSD `m_getcl() API atomically
>> then allocates a header mbuf and a cluster; the per-CPu caches return
>> suitable objects. But I could be misreading it.)
>
>it seems that you're confusing between clusters (m_getcl) and
>external storage (m_extadd).

Nope.  I am reading the BSDcon paper and looking at FreeBSD-5 source
code via cvsweb.


>> If, on the other hand, FreeBSD-5 is fragmenting the memory backing
>> external mbuf storage, then that strikes me as a poor design
>> choice. But it strikes me as a much, much worse choice for NetBSD, and
>> the wider range of CPUs we run on.
>
>then, do you have better alternative?

I'd say, if something is a bad idea, don't do it.  That's why I keep
asking: just ywhat do you percieve as the upside of KVA-adjacent
`small' headers?  Because to me it sounds like a bad idea.

Can you perhaps draw a diagram of, say, two 4k pages (convenient for
i386), plus any associated header storage, and then explain how (under
your envisioned changes) how these pages and the associated mbuf
storage, would be carved up into external mbuf storage with
KVA-adjacent small headers, and any regular headers? What does the
picture look like for a NIC driver using external storage (say, an
mbuf chain of MCLBYTE-clusters?)  to implement jumbo frames?

I would really like to see what you want to do, and thus to see just
where the KVA-adjacent "tiny headers" help you. Because to me, so far,
they sound like an unmitigated disaster for the applications I care
about (e.g., high-throughput NFS).