Subject: Re: SMP re-entrancy in kernel drivers/"bottom half?"
To: Daniel Carosone <dan@geek.com.au>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 02/22/2005 20:02:13
In message <20050222220910.GB430@bcd.geek.com.au>, Daniel Carosone writes:


>On Wed, Feb 23, 2005 at 07:05:30AM +0900, YAMAMOTO Takashi wrote:

>> i'm not sure if it's a good idea for today's cpus
>> as passing mbufs among cpus increases cache misses.
>
>That was the other thing in my mind as I read Jonathan's speculated
>motivation for the freebsd run to completion model.

As I beleive I replied to Yamamoto-san: I take the point.  But I am
very skeptical that run-to-completion will actually help the case of a
single large flow (e.g., on a 10Gbit NIC capable of 10Gbit using standard
1500-byte MTUs, or even the 7.5 Gbit aggregate attainable over PCI-X).

The way I see it, you have two choices: either you serialize all
packets for a given flow onto a single CPU (thus denying any SMP
throughput gains for any single flow); or you pay horrendous per-packet
TCB locking costs, *plus* the large downsides of repeated out-of-order
delivery to DCP, due to multiple CPUs concurrently handling packets from
a single stream.  Personally, I dont find either approach tenable,
but I am willing to learn better from those who've actually done it.


If I had to split layers across multiple CPUs, on an x86, I'd probably
map the pages used for little-mbufs (for received packets) write-through,
so that any sharing on the mbuf header lines happens from memory rather
than lines dirty in another CPU's cache.  I already did that once, back
circa NetBSD-1.3, but for the outbound packet buffer pages as well.
It made a measurable difference.

On a related note, thinking about SMP, run-to-completion, and i386: I
queried whether we avoid whacking CR0 when switching from one
kernel-only thread to another.  I'm told we used to, but that
optimization got lost during the LWP/sa changes. It's being worked on.