tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: On softints, softnet_lock and sleeping (aka ipv6 vs USB network interfaces)



Hi !

Is there any progress on this? I see also PR/49065 being still existent on an RPI2. Also running named with 4 threads on an RPI2 together with vtund doing "ifconfig tunX ..." is a sure killer. Runinng named with only one thread gets you over it (maybe just most of the time).

Softnet-locks should not be held while doing USB interactions.

Frank

On 12/06/15 10:22, Nick Hudson wrote:
Hi,

PR/50491 raises some questions that I need some guidance on.
Take this stack trace...

 Setting date via ntp.
 panic: assert_sleepable: softint caller=0x802e2014
 cpu2: Begin traceback...
 0xbada3c3c: netbsd:db_panic+0xc
 0xbada3c6c: netbsd:vpanic+0x1b0
 0xbada3c84: netbsd:snprintf
 0xbada3cbc: netbsd:assert_sleepable+0xb4
 0xbada3d0c: netbsd:usbd_do_request_flags_pipe+0x28
 0xbada3d34: netbsd:usbd_do_request+0x38
 0xbada3d64: netbsd:smsc_write_reg+0x60
 0xbada3d8c: netbsd:smsc_setmulti+0x100
 0xbada3dbc: netbsd:smsc_ioctl+0x124
 0xbada3e64: netbsd:if_mcast_op+0x50
 0xbada3eb4: netbsd:in6_delmulti+0x154
 0xbada3ecc: netbsd:in6_leavegroup+0x20
 0xbada3ef4: netbsd:in6_purgeaddr+0x6c
 0xbada3f2c: netbsd:nd6_timer+0x108
 0xbada3f64: netbsd:callout_softclock+0x194
 0xbada3fac: netbsd:softint_dispatch+0xd4

It seems to me that nd6_timer is either expecting too much of
the USB stack by expecting a synchronous interface to changing
multicast filters that doesn't sleep; or the USB stack should
provide an asynchronous update method and any failure should be
handled elsewhere.

Another problem in the PR is that

1) CPU N (not 0) takes softnet_lock and requests a USB control transfer
(which will sleep for completion)

2) CPU 0 takes clock interrupt and nd6_timer expires. nd6_timer starts and
tries to take softnet lock and blocks

3) CPU 0 also runs ipintr (not sure why) which takes softnet lock and locks

4) CPU 0 receives USB HC interrupt for completed control transfer from CPU N and schedules softint process (at IPL_SOFTNET) which never runs as the lwp
is blocked in step 3)

Maybe

    290 #define IPL_SOFTUSB IPL_SOFTNET

http://nxr.netbsd.org/xref/src/sys/dev/usb/usbdi.h#290

should be changed to IPL_SOFTBIO?

Thoughts?

Nick



Home | Main Index | Thread Index | Old Index