tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Breaking a usbnet deadlock with async multicast filter updates



There's a deadlock in some usbnet drivers, such as ure(4), arising
from a lock dependency cycle:

- usbnet core lock
- kpause / running callouts
- softnet_lock

The deadlock manifests through SIOCADDMULTI/SIOCDELMULTI, which unlike
all other ioctls sometimes run under the short-term packet-processing
lock softnet_lock and not the long-term configuration lock IFNET_LOCK:

- ure detaches and holds the usbnet core lock over kpause in ure_reset
  which awaits softint softclock `lock' to wake callouts
- callout holds the softint softclock `lock' and awaits softnet_lock
  (e.g., in frag6_fasttimo or any various other network callouts)
- mdnsd holds softnet_lock and awaits usbnet core lock in
  ure_ioctl(SIOCDELMULTI)

The attached patch adapts usbnet(9) to split SIOCADDMULTI/SIOCDELMULTI
handling into two stages:

1. Synchronously update the ethercom's multicast address list with
   ether_ioctl/addmulti/delmulti.

2. Defer updating the hardware's multicast filter to an asynchronous
   USB task.

This breaks the deadlock by making usbnet's SIOCADDMULTI/SIOCDELMULTI
never take the usbnet core lock.

However, it may delay updating the hardware's multicast filter a
little bit -- the filter won't be updated synchronously by the time
if_mcast_op returns.

Is this delay a problem?

I would assume no -- it seems like the worst possible outcome is that
we might lose a race with a few packets that get dropped on the floor
because the hardware's multicast filter wasn't told to accept them
yet, or we might lose a race and get a few packets that the hardware
was supposed to ignore, but neither of these seems particularly bad to
me.

But I don't understand this stuff very well so perhaps my assessment
is wrong.


Another approach would be to use a different lock to exclude
SIOCADDMULTI/SIOCDELMULTI -- maybe a short-term lock (never held over
kpause, safe to take under softnet_lock) together with a flag that
says `skip updating the hardware filter because reset (init or stop)
is in progress and init will do it before bringing the interface back
up anyway'.

Perhaps the usbnet core lock is the right lock for that -- after all,
we already have IFNET_LOCK as a long-term configuration change lock --
but it will require some more substantial changes to all the usbnet
drivers so they _don't_ hold the usbnet core lock over device reset
waits.  Certainly the asynchronous update approach will make pullups
to netbsd-9 much easier.
From 876b21f1f5080832090bd04ff15204740d5b9eb5 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Sun, 26 Dec 2021 00:21:10 +0000
Subject: [PATCH] usbnet: Defer hardware multicast filter updates to USB task.

Breaks deadlock:

- usbnet_detach holds usbnet lock, awaits kpause in ure_reset
- callout holds softclock `lock' (sequential softints, blocks kpause
  wakeup), awaits softnet_lock in tcp_timer_keep, frag6_fasttimo, &c.
- soclose holds softnet_lock, awaits usbnet lock in SIOCDELMULTI

This change breaks the deadlock by not passing the SIOCADDMULTI or
SIOCDELMULTI ioctl synchronously to the driver, which typically takes
the usbnet lock.

With this change, the ethernet layer still maintains the list of
multicast addresses synchronously, but we defer the driver logic that
updates the hardware multicast filter to an asynchronous USB task
without softnet_lock held.

This doesn't cause exactly the same ioctl to be sent to the driver --
usbnet just sends SIOCDELMULTI with an all-zero struct ifreq, and
might drop some ioctls if issued in quick succession.  This is OK
because none of the drivers actually distinguish between SIOCADDMULTI
and SIOCDELMULTI, or examine the argument; the drivers just commit
whatever multicast addresses are listed in the ethercom.
---
 sys/dev/usb/usbnet.c | 98 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 95 insertions(+), 3 deletions(-)

diff --git a/sys/dev/usb/usbnet.c b/sys/dev/usb/usbnet.c
index 1712d2f687f0..9fa061b3cea0 100644
--- a/sys/dev/usb/usbnet.c
+++ b/sys/dev/usb/usbnet.c
@@ -72,6 +72,7 @@ struct usbnet_private {
 
 	struct ethercom		unp_ec;
 	struct mii_data		unp_mii;
+	struct usb_task		unp_mcasttask;
 	struct usb_task		unp_ticktask;
 	struct callout		unp_stat_ch;
 	struct usbd_pipe	*unp_ep[USBNET_ENDPT_MAX];
@@ -1035,12 +1036,64 @@ usbnet_if_ioctl(struct ifnet *ifp, u_long cmd, void *data)
 		return uno_override_ioctl(un, ifp, cmd, data);
 
 	error = ether_ioctl(ifp, cmd, data);
-	if (error == ENETRESET)
-		error = uno_ioctl(un, ifp, cmd, data);
+	if (error == ENETRESET) {
+		switch (cmd) {
+		case SIOCADDMULTI:
+		case SIOCDELMULTI:
+			usb_add_task(un->un_udev, &unp->unp_mcasttask,
+			    USB_TASKQ_DRIVER);
+			error = 0;
+			break;
+		default:
+			error = uno_ioctl(un, ifp, cmd, data);
+		}
+	}
 
 	return error;
 }
 
+static void
+usbnet_mcast_task(void *arg)
+{
+	USBNETHIST_FUNC();
+	struct usbnet * const un = arg;
+	struct usbnet_private * const unp = un->un_pri;
+	struct ifnet * const ifp = usbnet_ifp(un);
+	bool dying;
+	struct ifreq ifr;
+
+	USBNETHIST_CALLARGSN(10, "%jd: enter", unp->unp_number, 0, 0, 0);
+
+	/*
+	 * If we're detaching, we must check unp_dying _before_
+	 * touching IFNET_LOCK -- the ifnet may have been detached by
+	 * the time this task runs.  This is racy -- unp_dying may be
+	 * set immediately after we test it -- but nevertheless safe,
+	 * because usbnet_detach waits for the task to complete before
+	 * issuing if_detach, and necessary, so that we don't touch
+	 * IFNET_LOCK after if_detach.  See usbnet_detach for details.
+	 */
+	mutex_enter(&unp->unp_core_lock);
+	dying = unp->unp_dying;
+	mutex_exit(&unp->unp_core_lock);
+	if (dying)
+		return;
+
+	/*
+	 * Pass a bogus ifr with SIOCDELMULTI -- the goal is to just
+	 * notify the driver to reprogram any hardware multicast
+	 * filter, according to what's already stored in the ethercom.
+	 * None of the drivers actually examine this argument, so it
+	 * doesn't change the ABI as far as they can tell.
+	 */
+	IFNET_LOCK(ifp);
+	if (ifp->if_flags & IFF_RUNNING) {
+		memset(&ifr, 0, sizeof(ifr));
+		(void)uno_ioctl(un, ifp, SIOCDELMULTI, &ifr);
+	}
+	IFNET_UNLOCK(ifp);
+}
+
 /*
  * Generic stop network function:
  *	- mark as stopping
@@ -1373,7 +1426,10 @@ usbnet_attach(struct usbnet *un,
 	un->un_pri = kmem_zalloc(sizeof(*un->un_pri), KM_SLEEP);
 	struct usbnet_private * const unp = un->un_pri;
 
-	usb_init_task(&unp->unp_ticktask, usbnet_tick_task, un, USB_TASKQ_MPSAFE);
+	usb_init_task(&unp->unp_mcasttask, usbnet_mcast_task, un,
+	    USB_TASKQ_MPSAFE);
+	usb_init_task(&unp->unp_ticktask, usbnet_tick_task, un,
+	    USB_TASKQ_MPSAFE);
 	callout_init(&unp->unp_stat_ch, CALLOUT_MPSAFE);
 	callout_setfunc(&unp->unp_stat_ch, usbnet_tick, un);
 
@@ -1506,6 +1562,8 @@ usbnet_detach(device_t self, int flags)
 	callout_halt(&unp->unp_stat_ch, NULL);
 	usb_rem_task_wait(un->un_udev, &unp->unp_ticktask, USB_TASKQ_DRIVER,
 	    NULL);
+	usb_rem_task_wait(un->un_udev, &unp->unp_mcasttask, USB_TASKQ_DRIVER,
+	    NULL);
 
 	mutex_enter(&unp->unp_core_lock);
 	unp->unp_refcnt--;
@@ -1534,6 +1592,40 @@ usbnet_detach(device_t self, int flags)
 	}
 	usbnet_ec(un)->ec_mii = NULL;
 
+	/*
+	 * We have already waited for the multicast task to complete.
+	 * Unfortunately, until if_detach, nothing has prevented it
+	 * from running again -- another thread might issue if_mcast_op
+	 * between the time of our first usb_rem_task_wait and the time
+	 * we actually get around to if_detach.
+	 *
+	 * Fortunately, the first usb_rem_task_wait ensures that if the
+	 * task is scheduled again, it will witness our setting of
+	 * unp_dying to true[*].  So after that point, if the task is
+	 * scheduled again, it will decline to touch IFNET_LOCK and do
+	 * nothing.  But we still need to wait for it to complete.
+	 *
+	 * It would be nice if we could write
+	 *
+	 *	if_pleasestopissuingmcastopsthanks(ifp);
+	 *	usb_rem_task_wait(..., &unp->unp_mcasttask, ...);
+	 *	if_detach(ifp);
+	 *
+	 * and then we would need only one usb_rem_task_wait.
+	 *
+	 * Unfortunately, there is no such operation available in
+	 * sys/net at the moment, and it would require a bit of
+	 * coordination with if_mcast_op and doifioctl probably under a
+	 * new lock.  So we'll use this kludge until that mechanism is
+	 * invented.
+	 *
+	 * [*] This is not exactly a documented property of the API,
+	 * but it is implied by the single lock in the task queue
+	 * serializing changes to the task state.
+	 */
+	usb_rem_task_wait(un->un_udev, &unp->unp_mcasttask, USB_TASKQ_DRIVER,
+	    NULL);
+
 	cv_destroy(&unp->unp_detachcv);
 	mutex_destroy(&unp->unp_core_lock);
 	mutex_destroy(&unp->unp_rxlock);


Home | Main Index | Thread Index | Old Index