tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: softint-based if_input



On Tue, Jan 26, 2016 at 6:52 AM, David Young <dyoung%pobox.com@localhost> wrote:
> On Mon, Jan 25, 2016 at 04:47:32PM +0900, Ryota Ozaki wrote:
>> On Mon, Jan 25, 2016 at 3:53 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
>> > On Mon, Jan 25, 2016 at 1:06 PM, Taylor R Campbell
>> > <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>> >>    Date: Mon, 25 Jan 2016 11:25:16 +0900
>> >>    From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
>> >>
>> >>    On Tue, Jan 19, 2016 at 2:22 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
>> >>    (snip)
>> >>    >> (a) a per-CPU pktq that never distributes packets to another CPU, or
>> >>    >> (b) a single-CPU pktq, to be used only from the CPU to which the
>> >>    >> device's (queue's) interrupt handler is bound.
>> >>    >>
>> >>    > I'll rewrite the patch as your suggestion (I prefer (a) for now).
>> >>
>> >>    Through rewriting it, I feel that it seems to be a lesser version of
>> >>    pktqueue. So I think it may be better changing pktqueue to have a flag
>> >>    to not distribute packets between CPUs than implementing another one
>> >>    duplicating pktqueue. Here is a patch with the approach:
>> >>    http://www.netbsd.org/~ozaki-r/pktq-without-ipi.diff
>> >>
>> >>    If we call pktq_create with PKTQ_F_NO_DISTRIBUTION, pktqueue doesn't
>> >>    setup IPI for softint and never call softint_schedule_cpu (i.e.,
>> >>    never distribute packets).
>> >>
>> >>    How about the approach?
>> >>
>> >> Some disjointed thoughts:
>> >>
>> >> 1. I don't think you actually need to change pktq(9).  It looks like
>> >> if you pass in cpu_index(curcpu()) for the hash, it will consistently
>> >> use the current CPU, for which softint_schedule_cpu has a special case
>> >> that avoids ipi.  So I don't expect it's substantially different from
>> >> <https://www.netbsd.org/~ozaki-r/softint-if_input.diff> -- though
>> >> maybe measurements will show my analysis is wrong!
>> >
>> > My intention is to prevent ipi_register in pktq_create and
>> > so we don't need ipi_sysinit movement...
>> >
>> >>
>> >> 2. Even though you avoid ipi(9), you're still using pcq(9), which
>> >> requires interprocessor synchronization -- but that is an unnecessary
>> >> cost because you're simply passing packets from hardintr to softintr
>> >> context on a single CPU.  So that's why I specifically suggested ifq,
>> >> not pcq or pktqueue.
>> >
>> > ...though, right. membars in pcq(9) are just overhead.
>> >
>> > Okay, I'll implement softint + percpu irqs.
>> >
>> >>
>> >> 3. Random thought: If we do polling, I wonder whether instead of (or
>> >> in addition to) polling for up to (say) 100 packets in a softint, we
>> >> really ought to poll for arbitrarily many packets in a kthread with
>> >> KTHREAD_TS, so that we don't need to go back and forth between
>> >> hardintr/softintr during high throughput, but we also don't starve
>> >> user threads in that case.
>> >
>> > Actually that was a POC implementation just to measure how polling
>> > is efficient (or not). So I don't intend to use the implementation
>> > as it is.
>> >
>> >>
>> >> I seem to recall starvation of user threads is what motivated matt@ to
>> >> split packet processing between a softint and a workqueue, depending
>> >> on the load, in bcmeth(4) (sys/arch/arm/broadcom/bcm53xx_eth.c).
>> >> Maybe he can comment on this?  Have you studied how this driver works,
>> >> and maybe pq3etsec(4) too, which also does polling?
>> >
>> > I had read pq3etsec(4) but not bcmeth(4). pq3etsec(4) seems to use
>> > only softint.
>> >
>> > Anyway I also concerned user threads starvation during implementing
>> > polling on wm(4). So the combination use of softint and workqueue
>> > sounds good. (FreeBSD's igb driver also does a similar technique,
>> > IIUC.)
>>
>> Hmm, I misunderstood a bit. bcmeth(4) kicks softint OR workqueue
>> depending on the load from HW interrupt (I thought HW interrupt
>> always calls softint and the softint kicks workqueue if there are
>> more incoming packets). I'm curious about throughput and latency
>> of this approach :)
>
> I tried once to make network-processing softints provide opportunities
> for user threads to run, but I realized after struggling with it that
> I was essentially solving a scheduling problem when we already had an
> adequate scheduler in the kernel.  I ended up using a timesharing thread
> to process the Rx ring and a very basic hardware Rx-interrupt handler,
> kind of like this:
>
> hardware interrupt handler:
>         disable interrupts
>         wake processing thread
>
> processing thread:
>         loop forever:
>                 enable interrupts
>                 wait for wakeup
>                 for each Rx packet on ring:
>                         process packet
>
> That stopped the user-tickle watchdog from firing.  It was handy having
> a full-fledged thread context to process packets in.  But there were
> trade-offs.  As Matt Thomas pointed out to me, if it takes longer for
> the NIC to read the next packet off of the network than it takes your
> thread to process the current packet, then your Rx thread is going to go
> back to sleep again after every single packet.  So there's potentially
> a lot of context-switch overhead and latency when you're receiving
> back-to-back large packets.

IIUC, bcmeth(4) solves (a part of?) the issue by using both softint and
workqueue; if load is not so high, only softint is dispatched and there
is lesser context-switch overhead than always using a Rx thread.
However, I'm not sure the approach works well or not actually.

>
> ISTR Matt had some ideas how context switches could be made faster, or
> h/w interrupt handlers could have an "ordinary" thread context, or the
> scheduler could control the rate of softints, or all of the above.  I
> don't know if there's been any progress along those lines in the mean
> time.

He left some notes at http://www.netbsd.org/~matt/smpnet , but I'm not
sure they are related to the above ideas. I think any of them aren't in
-current yet.

  ozaki-r


Home | Main Index | Thread Index | Old Index