Subject: Re: mbuf external storage sharing
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 10/04/2004 13:57:18
In message <1096713130.954835.12708.nullmailer@yamt.dyndns.org>,
YAMAMOTO Takashi writes:

>hi,
>
>the attached diffs are to change the way to share mbuf external storage.
>with the current linked list method, it's difficult to be mp-safe
>without having a global lock, because flags etc are not really shared.
>
>comments?

At a strageic level, rather than a comment-in-detail on your proposed
patch (to which my comment may not apply):

Rather than starting _de novo_, have you looked closely at how
FreeBSD-5 has evolved their TCP stack and the underlying network code
to support fine-grained locking?

I beleive (though I may be wrong) that the FreeBSD-5 work has had
considerable input from the BSDi code; and considerable subsequent
retuning. For example: 

* splitting send- and receive-specific flags from   so_state into separate
  socket-level and a new per-sb, flags-related field sb_state;

* separate locked and unlocked versions of sbappend(), so that
  (for example) an so[rw]akeup followed by an sbappend() can grab the
  lock once, perform   both append and wakeup operations, then unlock.

I'm not saying your diff is not useful and good. But there is
*considerable* prior art in adding fine-grained SMP locking to the BSD
TCP stack. I think it would be a mistake not to consult that prior art, first.

That said: in the specific case of mbufs, I beleive FreeBSD 5.x uses a
replacement for the traditional mbuf/mbuf-cluster allocation, with
per-CPU caches which solve two well-known problems: contention over
allocation locks, and ping-ponging the underlying-end memory objects
between CPUs. For more details, see the BSDCon 2004 paper:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf.

I have no idea whether we want to follow the directions in that paper
(either mbuf-level API or underlying implemntation); but we should at
least think about the quantitative data.