Re: CVS commit: src/sys

To: Taylor R Campbell <campbell%netbsd.org@localhost>, matthew green <mrg%netbsd.org@localhost>
Subject: Re: CVS commit: src/sys
From: Nick Hudson <nick.hudson%gmx.co.uk@localhost>
Date: Fri, 10 Aug 2018 18:54:10 +0100

On 10/08/2018 07:31, Taylor R Campbell wrote:
[snip]

pull across abort fixes from nick-nhusb.  add more abort fixes, using
ideas from Taylor and Nick, and myself.  special thanks to both who
inspired much of the code here, if not wrote it directly.


Unfortunately, there is a hole in this protocol which I didn't think
of until after you committed.

Consider the following sequence of events:

CPU 0                           CPU 1
-----                           -----
usbd_alloc_xfer
usbd_setup_xfer
usbd_transfer
callout_schedule
ehci_idone
                                 ehci_timeout
mutex_enter in ehci_idone
                                 (ehci_timeout loses race to acqure bus lock)
callout_stop (doesn't stop fired callout)
(xfer callback)
usbd_transfer
callout_schedule
                                 mutex_enter in ehci_timeout
                                 ehci_timeout_task
                                 aborts task instantly
                                 (xfer callback)
usbd_free_xfer
                                 ehci_timeout
callout_stop (doesn't stop fired callout)
kmem_free
                                 (stuff in ehci_timeout)

There are two separate bugs here.

1. The _second_ xfer (sharing the same struct usbd_xfer data
    structure but representing a subsequent transfer) is timed out
    instantly when it shouldn't be timed out until it requested.

2. The callout can run after usbd_free_xfer -- use-after-free.

The fix for the second bug is simple: use callout_halt (and
usb_rem_task_wait) in usbd_free_xfer, rather than the #if DIAGNOSTIC
callout_stop bogosity that it does right now.

The second bug should never occur as the intention was for drivers toalways use xfers until detach and detach would abort correctly.

However, the fix for the first bug requires more care.  As the timeout
mechanism works now, we need to ensure the callout and task have
_finished_ before we resubmit another xfer using the same usbd_xfer
object.


I agree this is a concern.

 But there are two reasons why we can't do this:

(a) The hardware completion interrupt handler can't sleep, so it can't
     use callout_halt.  Now, maybe you could argue that callout_halt on
     a callout whose action only acquires locks is OK for a softintr.
     But it _certainly_ can't use usb_rem_task_wait which does a
     cv_wait on some hardware condition.


All completions are done in softint, so I'm not sure this part applies.

(b) There is nothing in the USBD API that happens _between_ xfer
     callback and when you can next usbd_setup_xfer and usbd_transfer.
     In fact, many drivers assume they can do exactly that straight
     from the xfer callback in softintr context, so we can't even add
     an API that does a wait.

Fortunately, there is no need to repeatedly wait for the callout and
task to complete.  We just need a way to _request_ that they
reschedule, if they are still running.


Perhaps the fix is still appropriate despite my comments?

Nick

Follow-Ups:
- Re: CVS commit: src/sys
  - From: Taylor R Campbell

References:
- Re: CVS commit: src/sys
  - From: Taylor R Campbell

Prev by Date: Re: CVS commit: src/sys
Next by Date: Re: CVS commit: src/sys
Previous by Thread: Re: CVS commit: src/sys
Next by Thread: Re: CVS commit: src/sys
Indexes:

Home | Main Index | Thread Index | Old Index