tech-kern: Re: splx() optimization [was Re: SMP re-eetrancy in "bottom half" drivers]

Subject: Re: splx() optimization [was Re: SMP re-eetrancy in "bottom half" drivers]
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 06/07/2005 17:33:02

In message <1118189679.345817.301.nullmailer@yamt.dyndns.org>,
YAMAMOTO Takashi writes:

>> >what you're saying here is that you want performance but
>> >are not intersted in necessary work (== audio).
>> >i don't think it's reasonable.
>> 
>> Audio drivers have kHz sample rates and generally handle DMA buffers
>> with some nontrival number of samples. 10GbE ethernet NICs deliver
>> 800,000 1500-byte packet per second. You seriously think you can put
>> an onus on a developer, that working on the low-rate audio problem is
>> a pre-requisite to tackling the 10GbE problem?
>> 
>> That strikes me as indefensible.
>
>interrupt rate is not important at all.
[reordering]
>
>Jonathan, i can understand your frustration.
>however, i don't think there is a sane shortcut.

Rate is important in that there's no compelling _performance_ case to
make audio SMP-safe.  A single processor with an implicit lock (i.e.,
kernel_lock, acquired around all code that does or touches audio) is
adequate.

I don't hear anyone as bitterly frustated by single-CPU bottlenecks
for disk or audio or ... as I am frustrated for high-speed networking.

Maybe that makes me overly-willing to entertain approaches that strike
you or Jason or others as less than ideal, or less than sane.  I'm
perfectly willing to buy what I see as a very modest overhead for the
common-case: locore code grabs a per-ipl lock 

>the fact it can interrupt IPL_NET code is important.

Yes, there I am sure we all agree. Same goes for ttys and ppp code.
Hence my suggestion of IPL-locks.

>
>you can tackle network code today as i said in another mail.
>(http://mail-index.NetBSD.org/tech-kern/2005/06/07/0014.html)
>is there any compelling reason not to do so?

I dont see a viable alternative there.  Restricting changes "to my
local tree" is not viable. Flattening all IPLs strikes me as worse
than lock-per-IPL.  Then again, maybe that's because I expect my
workload to have one (or more) CPUs handling device-input/ether-input,
and a second CPU handling splsoftnet(), indefinitely.

I have a profiled kernel build in progress. I will boot that profiled
kernel, set up 500 Mbyte/sec of TCP traffic through an Opteron (250
Mbyte/sec in, 250 Mbyte/sec out, sustained); and measure what the
splx() rate is before any changes .  Ooops, I see I built NetBSD-3.0
rather than -current; hope that's good enough for discusion.