Subject: Re: splx() optimization [was Re: SMP re-eetrancy in "bottom half" drivers]
To: Stephan Uphoff <ups@tree.com>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 05/30/2005 13:52:00
In message <1117408273.1917.69442.camel@palm>, Stephan Uphoff writes:


>Only the IPI and timer interrupt pending bits are ordered.
>( Pending interrupts (except IPI,timer) are not currently scheduled in
>priority order on lowering the interrupt level) 
>
[...]

Hi Stephan,

Thanks for doing this.  I will try to make time to measure both with
and without this patch over the coming week.

Also, apologies for not responding earlier to your messages circa May
18 (dratted allergies!).  I was careless in describing Yamamoto-san's
original patch from 2003.  That patch collapsed all _device_
interrupts to a single level; IPI and timers were separate.

OTOH, the more I measure maximum TCP throughput (with ttcp-like tests),
the more convinced I become that what we (I) really need is a "pipeline" or
"dataflow" TCP, where multiple CPUs can process a single flow.  The tests
I've done so far indicate that, with TSO on wm(4) and a moderate-to-fast CPU,
NetBSD's TCP can send at about 2x the rate it can receive.

The implications for single-stream TCP max throughput are obvious.

If we start going down route of making the network stack even partially
SMP-safe, then I think the consensus on tech-kern is that we'll also
need to head in the direction of making all interrupts ordered; with
associated, hierarchically-ordered (spin)locks at each level.

Have you thought about that at all?

[I have my own ideas about where to get the most "bang-for-buck", in
terms of yield for [S]MOP effort, for multi-CPU TCP throughput: make
socket buffers SMP-safe, and process receive-side TCP up into the
socket buffer at splsoftnet(), while allowing a separate CPU to do the
upper-half copyout() in process context.  But that's a separate discussion.]