Subject: Re: mbuf external storage sharing
To: None <jonathan@dsg.stanford.edu>
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
List: tech-net
Date: 02/01/2005 21:56:14
hi,

Jonathan, given no responses, can i assume you've convinced?

YAMAMOTO Takashi


> hi,
> 
> > >> I repeat my own question: what `mapping state' do you need to change,
> > >
> > >state of lazy mapping of loaned pages.  pages're initially unmapped
> > >and then mapped into kernel address space when/if needed.
> > >at that point, the storage might be shared by mbufs.
> > 
> > To me, that seems a very *very* minor consideration, compared to
> > allocation of cluster+header for receive buffers, and setting up those
> > buffers for receive DMA.  We _cannot_, repeat _CANNOT let those
> > receive buffers cross page boundaries, or wc will (a) suffer gross
> > ineffiiciencies on some plaforms, or (b) tickle DMA bugs with some
> > NICs.
> 
> it might be minor for you.  but it isn't for me.
> 
> > Given that, I don't see offhand how you can allocate
> > phsyically-contiguous cluster+small headers, without wasting
> > unacceptably large amounts of KVA space.
> 
> they don't need to be physically contiguous.  see below.
> 
> > >> and exactly how does placing small mbufs physically adjacent to the
> > >> large external mbufs help you achieve this?
> > >
> > >sorry, i don't understand what you're saying here.
> > 
> > I dont know how else to say it. The way I read your proposed patch,
> > you want to allocate two objects: a small `header' and an external
> > mbuf, at adjacent addresses in KVA space.  To me, that seems
> > self-evidently a bad idea. So I'm asking, just what does the
> > KVA-adjacency of the small header and its associated external mbuf
> > help you to achieve?
> 
> are you talking about M_EXT_HDREMBEDDED in my patch?
> if so, it isn't the fundamental point of the patch.
> i added it just because i don't wanted to allocate two objects.
> 
> > >> As I beleive I said repeatedly: that is not what Milecik's BSDcon
> > >> paper appears to say.  (To me, the paper seems to say there is a
> > >> single `keg' which can allocate external storage or header mbufs; with
> > >> enough space reserved in the zone headers to implement per-slab-object
> > >> refcounts.
> > >
> > >these refcounts are "small headers" I said above.
> > 
> > Are you *absolutely* sure about that?  That's not what the paper
> > appears to say.
> 
> i'm not sure what you're claiming.
> they're small headers because they're headers and they're small.
> and, of course, i'm absolutely sure what i meant with "small headers".
> 
> > According to the paper,they are solely refcounts, in
> > space left aside by the FreeBSD UMA/zone allocators, and exposed via a
> > refcount-only API as refcounts for suitable external mbufs.
> 
> for clusters, yes.
> 
> > >> Those refcounts are then used to implement
> > >> per-external-storage refcounts. The FreeBSD `m_getcl() API atomically
> > >> then allocates a header mbuf and a cluster; the per-CPu caches return
> > >> suitable objects. But I could be misreading it.)
> > >
> > >it seems that you're confusing between clusters (m_getcl) and
> > >external storage (m_extadd).
> > 
> > Nope.  I am reading the BSDcon paper and looking at FreeBSD-5 source
> > code via cvsweb.
> 
> then, you know m_extadd normally doesn't use the refcounts provided by uma?
> 
> > >> If, on the other hand, FreeBSD-5 is fragmenting the memory backing
> > >> external mbuf storage, then that strikes me as a poor design
> > >> choice. But it strikes me as a much, much worse choice for NetBSD, and
> > >> the wider range of CPUs we run on.
> > >
> > >then, do you have better alternative?
> > 
> > I'd say, if something is a bad idea, don't do it.  That's why I keep
> > asking: just ywhat do you percieve as the upside of KVA-adjacent
> > `small' headers?  Because to me it sounds like a bad idea.
> > 
> > Can you perhaps draw a diagram of, say, two 4k pages (convenient for
> > i386), plus any associated header storage, and then explain how (under
> > your envisioned changes) how these pages and the associated mbuf
> > storage, would be carved up into external mbuf storage with
> > KVA-adjacent small headers, and any regular headers? What does the
> > picture look like for a NIC driver using external storage (say, an
> > mbuf chain of MCLBYTE-clusters?)  to implement jumbo frames?
> 
> again, "kva-adjacent" is not my point.
> 
> my point is, in order to implement lazy-mapping of
> external storages in somewhat mp-safe way, it's better to really share
> the some portion of mbuf header (unlike the current linked-list method).
> the patch i proposed is a way to achieve it.
> another way might be to directly-share the part of the original mbuf header
> as i said in one of the previous mails.  (i have a rough patch.
> do you want to see?)
> 
> > I would really like to see what you want to do, and thus to see just
> > where the KVA-adjacent "tiny headers" help you. Because to me, so far,
> > they sound like an unmitigated disaster for the applications I care
> > about (e.g., high-throughput NFS).
> 
> i think nfsd is one of subsystems which can gain the most from
> lazy-mapping.  currently, every time when it sends out
> data from page cache, it needs to map them into kva.
> if you really care about nfs performance, i don't see why you think
> it's a "very *very* minor consideration".
> 
> YAMAMOTO Takashi