NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/59557: deadlock in ure(4), sysctl(9), and suspend/resume
>Number: 59557
>Category: kern
>Synopsis: deadlock in ure(4), sysctl(9), and suspend/resume
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jul 26 02:05:00 +0000 2025
>Originator: Taylor R Campbell
>Release: 10
>Organization:
The DeadlockUSB Foundation
>Environment:
>Description:
1. I had a ure(4) device plugged in.
2. I suspended my laptop.
3. I unplugged the ure(4) device.
4. I resumed my laptop.
At this point, various processes wedged. New processes all
hung waiting for sysctl_treelock. The sysctl_treelock was held
as a _writer_ by the `sysctl -w hw.acpi.sleep.state=3' process,
which in turn was waiting in:
sleepq_block
cv_wait
usbd_transfer
usbd_do_request_len
usbd_do_request
ure_ctl.isra.0
ure_uno_mii_write_reg
mii_phy_reset
rgephy_reset
mii_phy_resume
device_pmf_driver_resume
pmf_device_resume
pmf_system_resume
acpi_enter_sleep_state
sysctl_hw_acpi_sleepstate
sysctl_dispatch
sys___sysctl
syscall
Two USB event threads were stuck in:
sleepq_block
turnstile_block
rw_vector_enter
sysctl_teardown
ubt_detach
config_detach
usb_disconnect_port
uhub_explore
usb_discover
usb_event_thread
sleepq_block
turnstile_block
mutex_vector_enter
usbnet_stop
usbnet_detach
config_detach
usb_disconnect_port
uhub_explore
usb_discover
usb_event_thread
One of the USB task threads was stuck in:
sleepq_block
turnstile_block
mutex_vector_enter
usbnet_tick_task
usb_task_thread
Relevant parts of autoconf device tree:
xhci0
usb0
uhub0
umass0
scsibus0
sd0
umass1
scsibus1
sd1
ure0
rgephy0
usb1
uhub1
uhidev0
uhid0
ugenif0
ubt0
uvideo0
video0
ugen0
(I also have xhci1 with usb2->uhub2 and usb3->uhub3, but there
are no USB devices on uhub2 or uhub3.)
>How-To-Repeat:
1. plug in ure(4)
2. suspend
3. unplug ure(4)
4. resume
>Fix:
First, it's not clear to me why a write lock must be taken on
sysctl_treelock when we're only writing to a sysctl node, but
not modifying the tree:
https://nxr.netbsd.org/xref/src/sys/kern/kern_sysctl.c?r=1.271#312
This may not be the cause of the deadlock, but it is the cause
of various processes wedging even though they're not doing
anything with ure(4).
The USB event thread with a stack trace through ubt_detach
appears to be a red herring, blocked by the real deadlock that
happens to occur while holding sysctl_treelock write-locked.
Similarly, the USB task thread with a stack trace through
usbnet_tick_task appears to be collateral damage.
The part I can't explain yet is this:
=> pmf_system_resume resumes xhci0 first, then ure0, then
rgephy0 (omitting intermediate nodes), which calls
mii_phy_resume, which takes the mii lock (struct
usbnet_private::unp_miilock) and then waits for a USB
transfer -- which will never complete because the devices is
gone, but it _also_ isn't being aborted.
Then, once xhci0 is resumed and delivers an interrupt to report
that a device has been disconnected via interrupt, the USB
event thread tries to config_detach(ure0) which does
usbnet_detach -> usbnet_stop -> mutex_enter(unp->unp_miilock).
I anticipated this problem in usb_subr.c rev. 1.270 back in
2022 when I changed usb_disconnect_port to call
usbd_suspend_pipe before calling config_detach -- this causes
any subsequent usbd_transfer calls to fail with USBD_CANCELLED
by setting pipe->up_aborting = true, and calls the bus's
upm_abort method for every transfer already queued.
So that _should_ have caused the transfer in ure_ctl to wake up
and fail. But why didn't it? I don't know!
Home |
Main Index |
Thread Index |
Old Index