NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/48151: USB race condition can lead to panic



>Number:         48151
>Category:       kern
>Synopsis:       USB race condition can lead to panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 23 16:45:00 +0000 2013
>Originator:     Aymeric Vincent
>Release:        -current 6.99.23
>Organization:
>Environment:
NetBSD z.local 6.99.23 NetBSD 6.99.23 (Z) #12: Thu Aug 22 16:04:56 CEST 2013  
vincent@z.local:/home/vincent/netbsd/src/sys/arch/amd64/compile/Z amd64

>Description:

usb_transfer_complete() releases the pipe lock to call the callback associated 
with a xfer.

Such callbacks may (see usbd_bulk_transfer_cb()) cv_broadcast() on the xfer's 
cv.

This will lead to usbd_transfer() being awoken and returning, which in turn 
will lead to the xfer being freed (and reused to confuse debugging).

All this while the xfer is still being used by usb_transfer_complete() and a 
crash occurs when it tries to use it again because the xfer is in an 
inconsistent state (being freed/reallocated/reused).

>How-To-Repeat:
Transfer lots of data for several minutes through USB (in my case using ugen), 
and notice a panic is triggered.

After enabling some debug output, notice that usually, ehci_idone() finishes 
before usbd_bulk_transfer() resumes:

ehci_idone: xfer=0xfffffe810ff7b8a8, pipe=0xfffffe810cde7d50 ready
ehci_idone: len=1024, actlen=1024, status=0x8d00
usb_freemem: large free
ehci_idone: ex=0xfffffe810ff7b8a8 done
usbd_bulk_transfer: transferred 1024

... except just before the crash, where ehci_idone() does not finish before the 
same xfer gets reused:

ehci_idone: xfer=0xfffffe810ff7b8a8, pipe=0xfffffe810cde7d50 ready
ehci_idone: len=1024, actlen=1024, status=0x8d00
usb_freemem: large free
usbd_bulk_transfer: transferred 1024  
panic: uskbedr_nbeull kd_itargannossfteirc:  asstsaerrtt iotnr a"ncsvf_iesr_ va1
l0i2d4( cbv)" tfeasi
led: ufslbe_ a"l.l.o/c.m.e/m.:. /l.a.r/gkee ranl/lkoecr n1_0c2o4n
dvar.c", line 340
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x136
kern_assert() at netbsd:kern_assert+0x48
cv_broadcast() at netbsd:cv_broadcast+0x53
usb_transfer_complete() at netbsd:usb_transfer_complete+0x253
ehci_idone() at netbsd:ehci_idone+0x19d
ehci_softintr() at netbsd:ehci_softintr+0x40
usb_soft_intr() at netbsd:usb_soft_intr+0x24
softint_dispatch() at netbsd:softint_dispatch+0xd9
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe80a21b7d70
Xsoftintr() at netbsd:Xsoftintr+0x4f

>Fix:
I submit this PR because it's not clear to me what is the right solution. The 
cv_broadcast/wakeup is done in two places :

in the callbacks called by usb_transfer_complete()
right after, directly in usb_transfer_complete()

Removing one of the two is required, but I don't know which is intended 
conceptually. Removing any of the two should fix the race because the only 
reason the xfer is reused in usb_transfer_complete() is to do this 
cv_broadcast()...



Home | Main Index | Thread Index | Old Index