kern/52961: ohci: panic on shutdown

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/52961: ohci: panic on shutdown
From: ozaki-r%netbsd.org@localhost
Date: Mon, 29 Jan 2018 09:15:00 +0000 (UTC)

>Number:         52961
>Category:       kern
>Synopsis:       ohci: panic on shutdown
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 29 09:15:00 +0000 2018
>Originator:     Ryota Ozaki
>Release:        -current
>Organization:
IIJ
>Environment:
NetBSD kvm 8.99.12 NetBSD 8.99.12 (KVM.PROF) #117: Mon Jan 29 12:05:56 JST 2018  ozaki-r@rangeley:(hidden) amd64
>Description:
ohci dies in ohci_detach on shutdown with:

panic: kernel diagnostic assertion "interlock == NULL || mutex_owned(interlock)" failed: file "/home/ozaki-r/git/netbsd-src/sys/kern/kern_timeout.c", line 476 
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0xffffffff8021ce28 cs 0x8 rflags 0x206 cr2 0x7f7ff7763570 ilevel 0 rsp 0xffff800044d8bbc0
curlwp 0xffffe4007c4ba360 pid 444.1 lowest kstack 0xffff800044d882c0
Stopped in pid 444.1 (reboot) at        netbsd:breakpoint+0x10: leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x10
vpanic() at netbsd:vpanic+0x145
kern_assert() at netbsd:kern_assert+0x4d
callout_halt() at netbsd:callout_halt+0x102
ohci_detach() at netbsd:ohci_detach+0x4a
ohci_pci_detach() at netbsd:ohci_pci_detach+0x2b
config_detach() at netbsd:config_detach+0x110
config_detach_all() at netbsd:config_detach_all+0x9c
cpu_reboot() at netbsd:cpu_reboot+0x177
sys_reboot() at netbsd:sys_reboot+0x7a
syscall() at netbsd:syscall+0x1f2
--- syscall (number 208) ---


>How-To-Repeat:
Shutdown NetBSD on a system with ohci devices (VirtualBox in my case).
>Fix:
It seems callout_halt in ohci_detach takes a wrong mutex, sc_lock.
The callout (ohci_rhsc_enable) takes sc_intr_lock so we should pass it
to callout_halt, not sc_lock.

However, in the first place, ohci_detach doesn't take any locks there
so we don't need to pass a mutex.

Meanwhile, there is a race condition. callout_reset can be called after
callout_halt. callout_reset is called from a softint handler (ohci_rhsc_softint),
so we can avoid the race condition by disestablish the softint prior to
callout_halt.

So a possible fix should be like this (not tested):

diff --git a/sys/dev/usb/ohci.c b/sys/dev/usb/ohci.c
index 3fc23607ffd..28443917cf2 100644
--- a/sys/dev/usb/ohci.c
+++ b/sys/dev/usb/ohci.c
@@ -377,13 +377,11 @@ ohci_detach(struct ohci_softc *sc, int flags)
        if (rv != 0)
                return rv;
 
-       callout_halt(&sc->sc_tmo_rhsc, &sc->sc_lock);
-
-       usb_delay_ms(&sc->sc_bus, 300); /* XXX let stray task complete */
-       callout_destroy(&sc->sc_tmo_rhsc);
-
        softint_disestablish(sc->sc_rhsc_si);
 
+       callout_halt(&sc->sc_tmo_rhsc, NULL);
+       callout_destroy(&sc->sc_tmo_rhsc);
+
        cv_destroy(&sc->sc_softwake_cv);
 
        mutex_destroy(&sc->sc_lock);

Prev by Date: port-macppc/52960: oea_startup: failed to allocate DEAD Zone: error=12
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: port-macppc/52960: oea_startup: failed to allocate DEAD Zone: error=12
Next by Thread: port-amd64/52964: src/sys/arch/amd64/amd64/db_interface.c:241: suspicous coding ?
Indexes:

Home | Main Index | Thread Index | Old Index