Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Crash in xhci driver





On 2016/06/30 10:17, Paul Goyette wrote:
I got this crash again.  This time I was paying a bit more attention.

The system actually froze for a few seconds (maybe 10 or 15 seconds), but it was still running.  No keyboard or mouse input was processed, however my xclock was still updating the clock display.

Then it switched to wscons display 0 and dropped into the debugger.

There were a whole slew of WARNING: SPL NOT LOWERED ON SYSCALL ... messages, more than a full page.  The last such message was


    WARNING: SPL NOT LOWERED ON SYSCALL 320 4 EXIT bd626a08 6

and then the same stack trace as before.  No actual panic message was displayed.

I have not yet tried the patch you have suggested - as soon as I finish my current project, I'll rebuild and reinstall with your patch.

Can you add these kernel options:

options DIAGNOSTIC
options USB_DEBUG
options XHCI_DEBUG

and, set this sysctl?

hw.xhci.debug=14

however, I have no idea to see the logs without kbd.


On Mon, 27 Jun 2016, Paul Goyette wrote:

On Mon, 27 Jun 2016, Takahiro Hayashi wrote:

Hi,

On 2016/06/27 09:22, Paul Goyette wrote:
Hmmm, my system was totally idle (just running xlockmore's "maze" screen saver!), and suddenly panic()d.

Here's the traceback (manually transcribed):

    vpanic + 0x140
    cd_play_msf
    xhci_new_device + 0x821
    usbd_new_device + 0x3e
    uhub_explore + 0x2fa
    usb_discover.isra.2 + 0x4e     (interesting symbol name!)
    usb_event_thread + 0x7c

According to gdb(1), this was the KASSERT() at sys/dev/usb/xhci.c:2106

2101                    //hexdump("slot context", cp, sc->sc_ctxsz);
2102                    uint8_t addr = XHCI_SCTX_3_DEV_ADDR_GET(cp[3]);
2103                    DPRINTFN(4, "device address %u", addr, 0, 0, 0);
2104                    /* XXX ensure we know when the hardware does something
2105                       we can't yet cope with */
2106                    KASSERT(addr >= 1 && addr <= 127);
2107                    dev->ud_addr = addr;
2108                    /* XXX dev->ud_addr not necessarily unique on bus */
2109                    KASSERT(bus->ub_devices[dev->ud_addr] == NULL);
2110                    bus->ub_devices[dev->ud_addr] = dev;

No devices were being inserted (or removed), so I'm unsure why it would be calling xhci_new_device().  The comments in the source seem to say that this code only gets called when a new device has been found....

Does your PC have usb keyboard and mice?

Yes, one USB keyboard, one USB mouse

What is the PCI vendor and product number of xHCI?

xhci0 at pci0 dev 20 function 0: vendor 8086 product 8c31 (rev. 0x05)
xhci0: interrupting at msi0 vec 0
xhci0: xHCI version 1.0
usb0 at xhci0: USB revision 3.0

Does your kernel say anything before panic after boot?

Hard to tell, since it was on my X display, running the "maze" module from xlockmore.  :)  When the panic occurred, it "jumped" to the console display, displayed the panic message, and displayed a 'db' prompt.  Of course, since the keyboard is USB, it was unuseable at that point!  :)

and, How frequently does the panic happen?

I've been running this kernel for a couple of weeks now, and the panic occurred only one time.

could you try this patch?

--- sys/dev/usb/xhci.c.bak    2016-06-13 01:32:30.000000000 +0900
+++ sys/dev/usb/xhci.c    2016-06-27 18:27:55.000000000 +0900
@@ -2091,8 +2129,10 @@ xhci_new_device(device_t parent, struct
         /* 4.3.4 Address Assignment */
        err = xhci_set_address(dev, slot, false);
-        if (err)
+        if (err) {
+            printf("%s: set address w/ bsr %u\n", __func__, err);
            goto bad;
+        }
         /* Allow device time to set new address */
        usbd_delay_ms(dev, USB_SET_ADDRESS_SETTLE);
@@ -2103,7 +2143,8 @@ xhci_new_device(device_t parent, struct
        DPRINTFN(4, "device address %u", addr, 0, 0, 0);
        /* XXX ensure we know when the hardware does something
           we can't yet cope with */
-        KASSERT(addr >= 1 && addr <= 127);
+        KASSERTMSG(addr >= 1 && addr <= 127, "addr %u out of range",
+            addr);
        dev->ud_addr = addr;
        /* XXX dev->ud_addr not necessarily unique on bus */
        KASSERT(bus->ub_devices[dev->ud_addr] == NULL);

I will add this on my next kernel build.  The messages look useful, even if I cannot reproduce my panic!



--
t-hash


Home | Main Index | Thread Index | Old Index