tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: softint-based if_input



On Mon, Jan 25, 2016 at 04:47:32PM +0900, Ryota Ozaki wrote:
> On Mon, Jan 25, 2016 at 3:53 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
> > On Mon, Jan 25, 2016 at 1:06 PM, Taylor R Campbell
> > <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
> >>    Date: Mon, 25 Jan 2016 11:25:16 +0900
> >>    From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
> >>
> >>    On Tue, Jan 19, 2016 at 2:22 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
> >>    (snip)
> >>    >> (a) a per-CPU pktq that never distributes packets to another CPU, or
> >>    >> (b) a single-CPU pktq, to be used only from the CPU to which the
> >>    >> device's (queue's) interrupt handler is bound.
> >>    >>
> >>    > I'll rewrite the patch as your suggestion (I prefer (a) for now).
> >>
> >>    Through rewriting it, I feel that it seems to be a lesser version of
> >>    pktqueue. So I think it may be better changing pktqueue to have a flag
> >>    to not distribute packets between CPUs than implementing another one
> >>    duplicating pktqueue. Here is a patch with the approach:
> >>    http://www.netbsd.org/~ozaki-r/pktq-without-ipi.diff
> >>
> >>    If we call pktq_create with PKTQ_F_NO_DISTRIBUTION, pktqueue doesn't
> >>    setup IPI for softint and never call softint_schedule_cpu (i.e.,
> >>    never distribute packets).
> >>
> >>    How about the approach?
> >>
> >> Some disjointed thoughts:
> >>
> >> 1. I don't think you actually need to change pktq(9).  It looks like
> >> if you pass in cpu_index(curcpu()) for the hash, it will consistently
> >> use the current CPU, for which softint_schedule_cpu has a special case
> >> that avoids ipi.  So I don't expect it's substantially different from
> >> <https://www.netbsd.org/~ozaki-r/softint-if_input.diff> -- though
> >> maybe measurements will show my analysis is wrong!
> >
> > My intention is to prevent ipi_register in pktq_create and
> > so we don't need ipi_sysinit movement...
> >
> >>
> >> 2. Even though you avoid ipi(9), you're still using pcq(9), which
> >> requires interprocessor synchronization -- but that is an unnecessary
> >> cost because you're simply passing packets from hardintr to softintr
> >> context on a single CPU.  So that's why I specifically suggested ifq,
> >> not pcq or pktqueue.
> >
> > ...though, right. membars in pcq(9) are just overhead.
> >
> > Okay, I'll implement softint + percpu irqs.
> >
> >>
> >> 3. Random thought: If we do polling, I wonder whether instead of (or
> >> in addition to) polling for up to (say) 100 packets in a softint, we
> >> really ought to poll for arbitrarily many packets in a kthread with
> >> KTHREAD_TS, so that we don't need to go back and forth between
> >> hardintr/softintr during high throughput, but we also don't starve
> >> user threads in that case.
> >
> > Actually that was a POC implementation just to measure how polling
> > is efficient (or not). So I don't intend to use the implementation
> > as it is.
> >
> >>
> >> I seem to recall starvation of user threads is what motivated matt@ to
> >> split packet processing between a softint and a workqueue, depending
> >> on the load, in bcmeth(4) (sys/arch/arm/broadcom/bcm53xx_eth.c).
> >> Maybe he can comment on this?  Have you studied how this driver works,
> >> and maybe pq3etsec(4) too, which also does polling?
> >
> > I had read pq3etsec(4) but not bcmeth(4). pq3etsec(4) seems to use
> > only softint.
> >
> > Anyway I also concerned user threads starvation during implementing
> > polling on wm(4). So the combination use of softint and workqueue
> > sounds good. (FreeBSD's igb driver also does a similar technique,
> > IIUC.)
> 
> Hmm, I misunderstood a bit. bcmeth(4) kicks softint OR workqueue
> depending on the load from HW interrupt (I thought HW interrupt
> always calls softint and the softint kicks workqueue if there are
> more incoming packets). I'm curious about throughput and latency
> of this approach :)

I tried once to make network-processing softints provide opportunities
for user threads to run, but I realized after struggling with it that
I was essentially solving a scheduling problem when we already had an
adequate scheduler in the kernel.  I ended up using a timesharing thread
to process the Rx ring and a very basic hardware Rx-interrupt handler,
kind of like this:

hardware interrupt handler:
	disable interrupts
	wake processing thread

processing thread:
	loop forever:
		enable interrupts
		wait for wakeup
		for each Rx packet on ring:
			process packet

That stopped the user-tickle watchdog from firing.  It was handy having
a full-fledged thread context to process packets in.  But there were
trade-offs.  As Matt Thomas pointed out to me, if it takes longer for
the NIC to read the next packet off of the network than it takes your
thread to process the current packet, then your Rx thread is going to go
back to sleep again after every single packet.  So there's potentially
a lot of context-switch overhead and latency when you're receiving
back-to-back large packets.

ISTR Matt had some ideas how context switches could be made faster, or
h/w interrupt handlers could have an "ordinary" thread context, or the
scheduler could control the rate of softints, or all of the above.  I
don't know if there's been any progress along those lines in the mean
time.

Dave

-- 
David Young
dyoung%pobox.com@localhost    Urbana, IL    (217) 721-9981


Home | Main Index | Thread Index | Old Index