[src/trunk]: src/sys/dev/usb Fix steady state of timeouts in ehci.

To: source-changes-hg%NetBSD.org@localhost
Subject: [src/trunk]: src/sys/dev/usb Fix steady state of timeouts in ehci.
From: riastradh <riastradh%NetBSD.org@localhost>
Date: Wed, 12 Feb 2020 18:06:30 +0000
details:   https://anonhg.NetBSD.org/src/rev/bf423600bdcb
branches:  trunk
changeset: 744766:bf423600bdcb
user:      riastradh <riastradh%NetBSD.org@localhost>
date:      Wed Feb 12 16:00:34 2020 +0000

description:
Fix steady state of timeouts in ehci.

This is complicated because:

1. There are three ways that an xfer can be completed:
   (a) hardware interrupt completes xfer
   (b) software decision aborts xfer with USBD_CANCELLED
   (c) timeout aborts xfer with USBD_TIMEOUT

2. The timeout abort can't be done in callout because ehci_sync_hc,
   called unconditionally by ehci_abort_xfer to wait until the device
   has finished using any references to the xfer, may sleep.  So we
   have to schedule a callout that, when run, will schedule a usb_task.

3. The hardware completion interrupt can't sleep waiting for a callout
   or task to finish -- can't use callout_halt or usb_rem_task_wait.
   So the callout and usb_task must be able to run _after_ the hardware
   completion interrupt, and recognize that they're late to the party.

   (Note, though, that usbd_free_xfer does wait for the callout and
   task to complete, so there's no danger they may use themselves after
   free.)

4. The xfer may resubmitted -- and the timeout may be rescheduled --
   immediately after the hardware completion interrupt, _while_ the
   callout and/or usb_task may still be scheduled.  Specifically, we
   may have the following sequence of events:
   (a) hardware completion interrupt
   (b) callout or usb_task fires
   (c) driver resubmits xfer
   (d) callout or usb_task acquires lock and looks around dazed and
       bewildered at the firehose of events like reading the news in 2019

The mechanism for sorting this out is that we have two bits of state:

- xfer->ux_timeout_set informs the driver, when submitting an xfer and
  setting up its timeout, whether either the callout or usb_task is
  already scheduled or not.

- xfer->ux_timeout_reset informs the callout or usb_task whether it
  should reschedule the callout, because the xfer got resubmitted, or
  not.

diffstat:

 sys/dev/usb/ehci.c |  331 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 287 insertions(+), 44 deletions(-)

diffs (truncated from 425 to 300 lines):

diff -r 03db0f0ee02d -r bf423600bdcb sys/dev/usb/ehci.c
--- a/sys/dev/usb/ehci.c        Wed Feb 12 16:00:17 2020 +0000
+++ b/sys/dev/usb/ehci.c        Wed Feb 12 16:00:34 2020 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: ehci.c,v 1.269 2020/02/04 06:30:46 mrg Exp $ */
+/*     $NetBSD: ehci.c,v 1.270 2020/02/12 16:00:34 riastradh Exp $ */
 
 /*
  * Copyright (c) 2004-2012 The NetBSD Foundation, Inc.
@@ -53,7 +53,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: ehci.c,v 1.269 2020/02/04 06:30:46 mrg Exp $");
+__KERNEL_RCSID(0, "$NetBSD: ehci.c,v 1.270 2020/02/12 16:00:34 riastradh Exp $");
 
 #include "ohci.h"
 #include "uhci.h"
@@ -168,6 +168,9 @@
 Static void            ehci_idone(struct ehci_xfer *, ex_completeq_t *);
 Static void            ehci_timeout(void *);
 Static void            ehci_timeout_task(void *);
+Static bool            ehci_probe_timeout(struct usbd_xfer *);
+Static void            ehci_schedule_timeout(struct usbd_xfer *);
+Static void            ehci_cancel_timeout_async(struct usbd_xfer *);
 Static void            ehci_intrlist_timeout(void *);
 Static void            ehci_doorbell(void *);
 Static void            ehci_pcd(void *);
@@ -1057,13 +1060,11 @@
        }
 
        /*
-        * Cancel the timeout and the task, which have not yet
-        * run.  If they have already fired, at worst they are
-        * waiting for the lock.  They will see that the xfer
-        * is no longer in progress and give up.
+        * We are completing the xfer.  Cancel the timeout if we can,
+        * but only asynchronously.  See ehci_cancel_timeout_async for
+        * why we need not wait for the callout or task here.
         */
-       callout_stop(&xfer->ux_callout);
-       usb_rem_task(xfer->ux_pipe->up_dev, &xfer->ux_aborttask);
+       ehci_cancel_timeout_async(xfer);
 
 #ifdef DIAGNOSTIC
 #ifdef EHCI_DEBUG
@@ -3235,35 +3236,38 @@
        KASSERT(mutex_owned(&sc->sc_lock));
        ASSERT_SLEEPABLE();
 
+       /*
+        * Nobody else can set this status: only one caller can time
+        * out, and only one caller can synchronously abort.  So the
+        * status can't be the status we're trying to set this to.
+        */
+       KASSERT(xfer->ux_status != status);
+
+       /*
+        * If host controller or timer interrupt has completed it, too
+        * late to abort.  Forget about it.
+        */
+       if (xfer->ux_status != USBD_IN_PROGRESS)
+               return;
+
        if (status == USBD_CANCELLED) {
                /*
-                * We are synchronously aborting.  Try to stop the
-                * callout and task, but if we can't, wait for them to
-                * complete.
+                * We are synchronously aborting.  Cancel the timeout
+                * if we can, but only asynchronously.  See
+                * ehci_cancel_timeout_async for why we need not wait
+                * for the callout or task here.
                 */
-               callout_halt(&xfer->ux_callout, &sc->sc_lock);
-               usb_rem_task_wait(xfer->ux_pipe->up_dev, &xfer->ux_aborttask,
-                   USB_TASKQ_HC, &sc->sc_lock);
+               ehci_cancel_timeout_async(xfer);
        } else {
-               /* Otherwise, we are timing out.  */
+               /*
+                * Otherwise, we are timing out.  No action needed to
+                * cancel the timeout because we _are_ the timeout.
+                * This case should happen only via the timeout task,
+                * invoked via the callout.
+                */
                KASSERT(status == USBD_TIMEOUT);
        }
 
-       /*
-        * The xfer cannot have been cancelled already.  It is the
-        * responsibility of the caller of usbd_abort_pipe not to try
-        * to abort a pipe multiple times, whether concurrently or
-        * sequentially.
-        */
-       KASSERT(xfer->ux_status != USBD_CANCELLED);
-
-       /* Only the timeout, which runs only once, can time it out.  */
-       KASSERT(xfer->ux_status != USBD_TIMEOUT);
-
-       /* If anyone else beat us, we're done.  */
-       if (xfer->ux_status != USBD_IN_PROGRESS)
-               return;
-
        /* We beat everyone else.  Claim the status.  */
        xfer->ux_status = status;
 
@@ -3473,6 +3477,16 @@
        KASSERT(mutex_owned(&sc->sc_lock));
 }
 
+/*
+ * ehci_timeout(xfer)
+ *
+ *     Called at IPL_SOFTCLOCK when too much time has elapsed waiting
+ *     for xfer to complete.  Since we can't abort the xfer at
+ *     IPL_SOFTCLOCK, defer to a usb_task to run it in thread context,
+ *     unless the xfer has completed or aborted concurrently -- and if
+ *     the xfer has also been resubmitted, take care of rescheduling
+ *     the callout.
+ */
 Static void
 ehci_timeout(void *addr)
 {
@@ -3489,12 +3503,38 @@
        }
 #endif
 
+       /* Acquire the lock so we can transition the timeout state.  */
        mutex_enter(&sc->sc_lock);
-       if (!sc->sc_dying && xfer->ux_status == USBD_IN_PROGRESS)
+
+       /*
+        * Use ehci_probe_timeout to check whether the timeout is still
+        * valid, or to reschedule the callout if necessary.  If it is
+        * still valid, schedule the task.
+        */
+       if (ehci_probe_timeout(xfer))
                usb_add_task(dev, &xfer->ux_aborttask, USB_TASKQ_HC);
+
+       /*
+        * Notify ehci_cancel_timeout_async that we may have scheduled
+        * the task.  This causes callout_invoking to return false in
+        * ehci_cancel_timeout_async so that it can tell which stage in
+        * the callout->task->abort process we're at.
+        */
+       callout_ack(&xfer->ux_callout);
+
+       /* All done -- release the lock.  */
        mutex_exit(&sc->sc_lock);
 }
 
+/*
+ * ehci_timeout_task(xfer)
+ *
+ *     Called in thread context when too much time has elapsed waiting
+ *     for xfer to complete.  Issue ehci_abort_xfer(USBD_TIMEOUT),
+ *     unless the xfer has completed or aborted concurrently -- and if
+ *     the xfer has also been resubmitted, take care of rescheduling
+ *     the callout.
+ */
 Static void
 ehci_timeout_task(void *addr)
 {
@@ -3505,11 +3545,223 @@
 
        DPRINTF("xfer=%#jx", (uintptr_t)xfer, 0, 0, 0);
 
+       /* Acquire the lock so we can transition the timeout state.  */
        mutex_enter(&sc->sc_lock);
-       ehci_abort_xfer(xfer, USBD_TIMEOUT);
+
+       /*
+        * Use ehci_probe_timeout to check whether the timeout is still
+        * valid, or to reschedule the callout if necessary.  If it is
+        * still valid, schedule the task.
+        */
+       if (ehci_probe_timeout(xfer))
+               ehci_abort_xfer(xfer, USBD_TIMEOUT);
+
+       /* All done -- release the lock.  */
        mutex_exit(&sc->sc_lock);
 }
 
+/*
+ * ehci_probe_timeout(xfer)
+ *
+ *     Probe the status of xfer's timeout.  Acknowledge and process a
+ *     request to reschedule.  Return true if the timeout is still
+ *     valid and the caller should take further action (queueing a
+ *     task or aborting the xfer), false if it must stop here.
+ */
+Static bool
+ehci_probe_timeout(struct usbd_xfer *xfer)
+{
+       EHCIHIST_FUNC(); EHCIHIST_CALLED();
+       struct ehci_softc *sc = EHCI_XFER2SC(xfer);
+       bool valid;
+
+       KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
+
+       /* The timeout must be set.  */
+       KASSERT(xfer->ux_timeout_set);
+
+       /*
+        * Neither callout nor task may be pending; they execute
+        * alternately in lock step.
+        */
+       KASSERT(!callout_pending(&xfer->ux_callout));
+       KASSERT(!usb_task_pending(xfer->ux_pipe->up_dev, &xfer->ux_aborttask));
+
+       /* There are a few cases... */
+       if (sc->sc_dying) {
+               /* Host controller dying.  Drop it all on the floor.  */
+               xfer->ux_timeout_set = false;
+               xfer->ux_timeout_reset = false;
+               valid = false;
+       } else if (xfer->ux_timeout_reset) {
+               /*
+                * The xfer completed _and_ got resubmitted while we
+                * waited for the lock.  Acknowledge the request to
+                * reschedule, and reschedule it if there is a timeout
+                * and the bus is not polling.
+                */
+               xfer->ux_timeout_reset = false;
+               if (xfer->ux_timeout && !sc->sc_bus.ub_usepolling) {
+                       KASSERT(xfer->ux_timeout_set);
+                       callout_schedule(&xfer->ux_callout,
+                           mstohz(xfer->ux_timeout));
+               } else {
+                       /* No more callout or task scheduled.  */
+                       xfer->ux_timeout_set = false;
+               }
+               valid = false;
+       } else if (xfer->ux_status != USBD_IN_PROGRESS) {
+               /*
+                * The xfer has completed by hardware completion or by
+                * software abort, and has not been resubmitted, so the
+                * timeout must be unset, and is no longer valid for
+                * the caller.
+                */
+               xfer->ux_timeout_set = false;
+               valid = false;
+       } else {
+               /*
+                * The xfer has not yet completed, so the timeout is
+                * valid.
+                */
+               valid = true;
+       }
+
+       /* Any reset must have been processed.  */
+       KASSERT(!xfer->ux_timeout_reset);
+
+       /*
+        * Either we claim the timeout is set, or the callout is idle.
+        * If the timeout is still set, we may be handing off to the
+        * task instead, so this is an if but not an iff.
+        */
+       KASSERT(xfer->ux_timeout_set || !callout_pending(&xfer->ux_callout));
+
+       /*
+        * The task must be idle now.
+        *
+        * - If the caller is the callout, _and_ the timeout is still
+        *   valid, the caller will schedule it, but it hasn't been
+        *   scheduled yet.  (If the timeout is not valid, the task
+        *   should not be scheduled.)
+        *
+        * - If the caller is the task, it cannot be scheduled again
+        *   until the callout runs again, which won't happen until we
+        *   next release the lock.
+        */
+       KASSERT(!usb_task_pending(xfer->ux_pipe->up_dev, &xfer->ux_aborttask));
+
+       KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
+
+       return valid;
+}
+
+/*
+ * ehci_schedule_timeout(xfer)
+ *
+ *     Ensure that xfer has a timeout.  If the callout is already
+ *     queued or the task is already running, request that they
+ *     reschedule the callout.  If not, and if we're not polling,
+ *     schedule the callout anew.
+ */
+Static void
+ehci_schedule_timeout(struct usbd_xfer *xfer)
+{
+       EHCIHIST_FUNC(); EHCIHIST_CALLED();
+       ehci_softc_t *sc = EHCI_XFER2SC(xfer);
+
+       KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
+
+       if (xfer->ux_timeout_set) {
+               /*
+                * Callout or task has fired from a prior completed
+                * xfer but has not yet noticed that the xfer is done.
Prev by Date: [src/trunk]: src/sys/dev/usb New xfer state variables ux_timeout_set and ux_t...
Next by Date: [src/trunk]: src/sys Factor out HCI-independent xfer completion logic.
Previous by Thread: [src/trunk]: src/sys/dev/usb New xfer state variables ux_timeout_set and ux_t...
Next by Thread: [src/trunk]: src/sys Factor out HCI-independent xfer completion logic.
Indexes:
Home | Main Index | Thread Index | Old Index