tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: On softints, softnet_lock and sleeping (aka ipv6 vs USB network interfaces)



On 12/06/15 16:01, Taylor R Campbell wrote:
    Date: Sun, 6 Dec 2015 09:22:05 +0000
    From: Nick Hudson <skrll%netbsd.org@localhost>

    It seems to me that nd6_timer is either expecting too much of
    the USB stack by expecting a synchronous interface to changing
    multicast filters that doesn't sleep; or the USB stack should
    provide an asynchronous update method and any failure should be
    handled elsewhere.

One quick fix might be to change nd6_timer to call in6_purgeaddr in a
workqueue (or...task, if we had that).  It seems to me that
in6_purgeaddr is a relatively expensive operation (and I think there's
a bug: it calls callout_stop via nd6_dad_stop/nd6_dad_stoptimer when
it should probably call callout_halt), so calling it from a callout
doesn't seem right anyway.

Sure, but the "sleeping with softnet_lock held" problem still exists.

    Another problem in the PR is that

    1) CPU N (not 0) takes softnet_lock and requests a USB control transfer
    (which will sleep for completion)

    2) CPU 0 takes clock interrupt and nd6_timer expires.  nd6_timer starts and
    tries to take softnet lock and blocks

    3) CPU 0 also runs ipintr (not sure why) which takes softnet lock and locks

Aside: This is probably because ipintr gets scheduled on a specified
target CPU, not on the local CPU, in pktq_enqueue...and apparently
every caller, except for bridges, specifies CPU 0.

    4) CPU 0 receives USB HC interrupt for completed control transfer from CPU N
    and schedules softint process (at IPL_SOFTNET) which never runs as the lwp
    is blocked in step 3)

    Maybe

         290 #define IPL_SOFTUSB IPL_SOFTNET

    http://nxr.netbsd.org/xref/src/sys/dev/usb/usbdi.h#290

    should be changed to IPL_SOFTBIO?

As a practical matter, I don't see how that would help -- IPL_SOFTUSB
is only ever used as a mutex ipl, and IPL_SOFT* is equivalent to
IPL_NONE for the practical purposes of mutex(9).

Why do IPL_SOFT* exist? If the intention is to show usage then IPL_SOFT is
enough, right?
That aside, can softints even interrupt softints, or are the
priorities only about who goes first if two softints are scheduled
`simultaneously' (as far as softint_dispatch can discern)?

The latter, iiuc.
If so, even then, using SOFTINT_BIO instead of SOFTINT_NET wouldn't
help here -- SOFTINT_BIO is even lower-priority than SOFTINT_NET, but
you need to allow the USB HC interrupt to run (I assume you mean,
e.g., ehci_softintr?) faster than SOFTINT_NET in order to wake
usbd_transfer while a SOFTINT_NET lwp is blocked on softnet_lock.

Looking at spl(9) and the code again I meant SOFTINT_SERIAL. The idea would indeed be that the USB HC softint (e.g. ehci_softintr) would run before SOFTINT_NET handlers.


It seems to me the deeper problem is that we ever sleeping with
softnet_lock held at all, which (a) is wrong and (b) means it is wrong
to acquire softnet_lock in a softint.

Fixes welcome :)

Thanks,
Nick


Home | Main Index | Thread Index | Old Index