Subject: scsipi_done() called twice when usb disk detaches - what's broken?
To: None <tech-kern@netbsd.org>
From: Nathan J. Williams <nathanw@wasabisystems.com>
List: tech-kern
Date: 09/22/2005 18:29:44
Lately I've been looking into the various failure modes that occur if
a USB storage device is detached while a data operation is in
progress. The latest goes something like this:

 1. The device is physically removed.

 2. SCSI command in progress fails. umass_scsipi_cb() sets xs->error
    to XS_RESET and calls scsipi_done(xs), which puts xs on the
    chan->chan_complete queue, expecting that
    scsipi_completion_thread() will complete the command.

 3. USB code notices that the device has been removed and calls
    config_detach() on the umass device. The umass device detaches the
    scsibus device. scsibusdetach() iterates over the outstanding
    commands and calls scsipi_done() on each of them. In the process,
    it calls scsipi_done() on the same transfer that was done'd in
    step 2. This mangles the chan->chan_complete queue a bit....

 4. The scsibus kthread runs and processes the xfer on the completion
    queue, calls scsipi_complete(xs), which puts the xfer back into
    the scsipi_xfer_pool. However, due to the queue mangling above,
    the next thing on the queue is still the same xfer, which gets
    processed again. By this time the pool code has mangled the xfer's
    pointers, and the TAILQ_REMOVE() causes a fault. "Boom."


What level is wrong here? This can be worked around by making
scsipi_done() ignore transfers that have already been through it (by
checking for XS_STS_DONE), but perhaps something should have avoided
this situation in the first place (in which case scsipi_done() could
use a KASSERT that XS_STS_DONE is *not* set).

        - Nathan