tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: usbdi(9) improvements



New patch to fix abort races and help USB device drivers -- with a
less intrusive change to aborting pipes for now.  Since this still
changes the usbdi(9) API and ABI, though, it will require a kernel
version bump.

Review, testing, and comments welcome!


Changes to usbdi(9):

- The following functions which previously returned a usbd_status but
  never failed in practice now return void:

  . usbd_close_pipe
  . usbd_abort_pipe
  . usbd_abort_default_pipe

- usbd_transfer no longer races with usbd_abort_pipe to detect when
  the pipe is currently being aborted.

- New functions usbd_suspend_pipe and usbd_resume_pipe.

  . usbd_suspend_pipe fails any new and future xfers with
    USBD_CANCELLED.

  . usbd_resume_pipe allows xfers on a suspended pipe again.

  The existing function usbd_abort_pipe is now the same as doing
  usbd_suspend_pipe and then usbd_resume_pipe -- no change in its
  semantics.

  For many drivers, the difference between usbd_abort_pipe and
  usbd_suspend_pipe is immaterial because they only
  open/transfer/abort/close, in that order.

  But there are some drivers like ucom(4) which (a) open pipes on
  attach, (b) abort on pipes on close, (c) expect to be able to use
  the pipes again on reopen, and (d) only close pipes on detach.
  These drivers need the effect of usbd_abort_pipe to be transient.

  There are also some drivers that need usbd_suspend_pipe semantics
  (persistently cancel xfers on the pipe), and _not_ usbd_abort_pipe
  semantics (cancel current xfers, allow future xfers).  Specifically,
  drivers that issue _synchronous_ transfers like usbd_sync_transfer
  or usbd_bulk_transfer need usbd_suspend_pipe to avoid a race between
  I/O and abort:

	/* I/O thread */
	if (sc->sc_dying)
		return EIO;
	// (*)
	err = usbd_bulk_transfer(...);

	/* abort thread */
	sc->sc_dying = true;
	usbd_abort_pipe(...);
	wait_for_io_thread(...);
	usbd_destroy_xfer(...);

  If the abort thread runs at the time of (*) in the I/O thread, this
  is a deadlock.  Using usbd_suspend_pipe instead of usbd_abort_pipe
  resolves this deadlock -- usbd_bulk_transfer will fail with
  USBD_CANCELLED in this case.

  => The previous patch series made the effect of usbd_abort_pipe
     itself persistent, but that changed the semantics in a way that
     would break drivers, and I don't want to have to audit all the
     drivers -- and, e.g., change ucom(4) to open pipes on open
     instead of attach, and close pipes on close instead of detach --
     before netbsd-10.

  => Another alternative is to give usbd_bulk_transfer an interlock
     like cv_wait, so sc_dying and the transfer/abort can be done
     under the interlock to avoid a race.  The usbd_suspend_pipe
     semantics seems like a simpler way to solve this problem.


Changes to host controller interface API:

- struct usbd_pipe_methods::upm_transfer is no longer responsible for
  calling usb_insert_transfer to insert the transfer in the pipe's
  queue.  usbd_transfer takes care of that itself.

  This was necessary to fix the usbd_transfer / usbd_abort_pipe race
  mentioned above.

- struct usbd_pipe_methods::upm_transfer and upm_start are now called
  under the pipe lock.

  The logic in usbd_transfer is now basically:

	usbd_lock_pipe(pipe);
	if (pipe->up_aborting) fail with USBD_ECANCELLED;
	SIMPLEQ_INSERT(pipe->up_queue, xfer);
        pipe->up_methods->upm_transfer(xfer);
        // usually calls upm_start
	usbd_unlock_pipe(pipe);

  This fixes racy access to the pipe's queue, which could interact
  badly with abort (and might explain any of various obscure USB crash
  PRs we have open).

- struct usbd_pipe_methods::upm_transfer is forbidden from returning
  USBD_NORMAL_COMPLETION for asynchronous xfers.  This would imply
  that the xfer has completed, which must not happen synchronously,
  because the caller may hold locks that the callback needs to take,
  and doing that synchronously would lock-against-self.

  This rule was already implicit in the logic, but there is now a
  KASSERT to detect it.

- struct usbd_bus_methods::ubm_abortx is no longer responsible for
  calling usb_transfer_complete -- usbd_xfer_abort now does that.
  Note that .ubm_abortx MUST NOT release the pipe (bus) lock; this was
  a bug in xhci(4) abort.

- struct usbd_bus_methods::ubm_rhctrl is now called under the bus
  lock.


(I committed the xhci(4) abort/clear-endpoint-stall bug fix that was
included in the previous patch series -- it does not depend on any of
this, and it may be pullable-up to netbsd-9 this way.)


Home | Main Index | Thread Index | Old Index