tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: struct ifnet and ifaddr handling [was: Re: Making global variables of if.c MPSAFE]

On Fri, Nov 14, 2014 at 6:31 AM, David Young <> wrote:
> On Thu, Nov 13, 2014 at 04:26:52AM +0000, Taylor R Campbell wrote:
>>    Date: Thu, 13 Nov 2014 12:43:26 +0900
>>    From: Ryota Ozaki <>
>>    Here is a new patch:
>>    I think the patch reflects rmind's suggestions:
>>    - Use pserialize for IFNET_FOREACH
>>      - but use a lock for blockable/sleepable critical sections
>>    - cpu_intr_p workaround for HW interrupt
>>    Any comments?
>> Hmm...some quick notes from a non-expert in sys/net:
>> - You call malloc(M_WAITOK) while the ifnet lock is held, in
>>   if_alloc_sadl_locked, which is not allowed.
>> - You call copyout in a pserialize read section, in ifconf, which is
>>   not allowed because copyout may block.
>> - I don't know what cpu_intr_p is working around but it's probably not
>>   a good idea!
>> Generally, all that you are allowed to do in a pserialize read section
>> is read a small piece of information, or grab a reference to a data
>> structure which you are then going to use outside the read section.
>> I don't think it's going to be easy to scalably parallelize this code
>> without restructuring it, unless as a stop-gap you use a heaver-weight
>> reader-writer lock like the prwlock at
>> <>.
>> (No idea how much overhead this might add.)
> Parallelizing the network code without restructuring it?  That
> sounds like something I have tried before.  Avoid it, if you can!
> When I was confronted at a previous job with the problem of rapidly
> MP-ifying the network stack, I introduced a lightweight reader/writer
> lock for the network configuration.  I called that lock the "corral."
> A thread had to enter the corral before it read or modified a route,
> ifnet, ifaddr, etc.  There could be multiple readers in the corral,
> or one writer in the corral, never a reader and writer at once---i.e.,
> usual reader/writer semantics.  The corral was designed so that
> readers could enter and exit very quickly without using any locked
> instructions or modifying any shared cachelines in the common case.
> It was fairly expensive for a writer to enter the corral, or for
> a reader to wait for a writer to exit before it entered.
> The corral was introduced gradually, the kernel_lock and softnet_lock
> gradually phased out.  In retrospect, I probably introduced the
> corral_enter() calls in a different order with KERNEL_LOCK() and
> mutex_enter(softnet_lock) calls than I should have, and that caused
> me a lot of grief.  It did not help that the kernel did not make
> KERNEL_LOCK() and mutex_enter(softnet_lock) calls in a consistent
> order.
> IIRC, some of the waits in the corral entailed cv_wait() and other
> blocking calls that could not be made from an interrupt context.
> Anyway, I ended up in the end running virtually all packet processing
> in a LWP context so that I could use whichever synchronization
> objects I liked.  NIC interrupts were just responsible for waking
> per-CPU threads that processed the Rx rings.  This approach can
> work ok if your starting point is the inefficient legacy NetBSD
> packet processing and there are a preponderance of small packets.
> Matt Thomas once explained to me some serious overheads that crop
> up when the inter-packet interval is long and your packet processing
> is tuned up, and I might make different trade-offs between
> interrupt/LWP processing with 20/20 hindsight.
> Anyway, while I worked on the rapid MP-ification project, I had on
> my desk an eminent engineer's thoroughgoing and ambitious plan for
> MP-ifying a *BSD network stack: a rational MP-ification project.
> I don't think that my team could have implemented that plan on a
> timescale that met the business need, but if you take a global
> perspective---how many times will persons and projects around the
> world will be hobbled by NetBSD's legacy network stack before it
> is finally restructured?---then it's pretty clear that now is the
> time to begin a rational project.

Thank you so much. It's thought-provoking.

First I'm not against restructuring, though I hoped minimum
restructuring on non-performance-sensitive paths.

BTW, do you think we eventually introduce "all packet processing in
a LWP context"-like restructuring? I'm inclined to run Layer2 and bpf
in softint. If we also end up doing so, I want to do it early.
Of course, we have to address performance issues somehow at some point


> Dave
> --
> David Young
>    Urbana, IL    (217) 721-9981

Home | Main Index | Thread Index | Old Index