Subject: Re: SMP re-entrancy in kernel drivers/"bottom half?"
To: Matt Thomas <matt@3am-software.com>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 02/24/2005 13:38:09
In message <6.1.2.0.2.20050224125613.05087de0@localhost>,Matt Thomas writes:

[...]

>The primary purpose of TOE is to allow a CPU to deal with high-traffic
>flows in larger chunks than could be done on a per-packet basis.  This
>results in less overhead per packet and thus more CPU cycles for other
>things.

Hmmm. That's your take, but be advised that not everybody sees it
quite that way.  (Though, from conversation on e2e, many people
running servers with per-cpu-licensed software do see it that way!)

It's also not yet entirely clear that TOEs are going to be a winner
for general-purpose TCP stacks (as opposed to, for example, dedicated
"iSCSI HBA" devices). But I agree 100% about wanting to make our stack
amenable to TOE devices.

However, currently-available 10GbE products don't offer much offload:
At most, large-send offload (aka TSO), and interrupt mitigation. It is
a true fact that a single CPU[*] cannot keep up with such a device,
and that future single CPUs will not be that much faster than today's
single CPUs. Thus, if we want to support the PCI_X limited bandwidth
today's hardware can achieve, the *only* viable option is to transform
the stack into a pipeline, in which different CPUs handle different
stages of the pipeline.


Maybe I'm too pessimistic, but after seeing all the corner-cases where
hardware engineers got something as simple as TCP checksum offload ...
not quite right, on the first, or second, or third iteration, I have
very little confidence TOE vendors will get TCP "right" before their
second or third generation, either.
	
So desiging a TCP stack that can only ever get high throughput from
TOE NICs strikes me as a losing proposition.



[*] With, I suppose, the possible exceptiohn of a Itanic with monster
9MB caches. But NetBSD doesn't run on such CPUs yet anyway.