tech-kern: Re: splx() optimization [was Re: SMP re-eetrancy in "bottom half" drivers]

Subject: Re: splx() optimization [was Re: SMP re-eetrancy in "bottom half" drivers]
To: Tonnerre <tonnerre@thundrix.ch>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 06/02/2005 11:27:03

In message <20050602175328.GA8886@pauli.thundrix.ch>,
Tonnerre writes:

>Salut,
>
>On Wed, Jun 01, 2005 at 11:35:35PM -0700, Bill Studenmund wrote:
>> I thought the point was that interrupt handlers now will use spin locks to
>> protect their data. When an interrupt handler runs, it will run with SPL
>> set to block its interrupt on the CPU it's on, and it will also grab a
>> spinlock so that no other CPU services this inter (and also keep other
>> processing that would now block interrupts out of the data).
>
>I remember documents stating that it's not a problem if two CPUs serve the
>same softint, it would only give you trouble for hardware interrupts,
>obviously. Good enough locking should ensure that two CPUs processing the
>same softint won't interfere with each other.

Hello Tonnerre,

My first (and second) thought is that whatever document you saw does
not apply to NetBSD (or any BSD from UC Berkeley CRSG, for that matter).

The classic BSD TCP stack has _no_ locking in the networking stack:
none whatsoever. The only synchronization is via SPLs: for TCP, that's
via splsoftnet(), from ip_intrq processing upward.  Therefore If two
CPUs attempt to process at splsoftnet at the same time, they *will*
run into race conditions all through the stack.

Even if you allow *one* CPU to run the networking stack at
splsofnet(), then there's still a large set of possible races between
the CPU at splsofnet() (which, by hypothesis, is outside the biglock)
on the one hand; and on the other hand, CPUs holding the biglock
running inside the kernel in sosend() or soreceive().

>Otherwise it might be good to simply not do softint processing if another
>CPU does it at the moment, instead of spinning like we do now.

That's what I'd like to see in the short-term, and what I was thinking
of in my first reply. But you still need to prevent races with
upper-half code: that becomes very hairy very fast.  Again, I saw two
choices:

#1: If you try to hide the locking inside splsoftnet(), the kernel
bottom-half really wants a "try to acquire splsofnet lock, but dont
block if another CPU is already running it; whereas upper-half code
wants a real blocking lock.

#2: Alternatively, we need to add real explicit synchronization
primitives around all access to, and manipulation of, socket state and
socket buffers.  That's a lot of work (though I expect we could get a
long way by reusing the work done in FreeBSD-5).

I'd very much like to see #2, but (as I said), you are biting off
a lot more work than one might have thought at first sight.