Subject: ohci_softintr() panic's because of chipset bug?
To: None <tech-kern@netbsd.org>
From: Karl Janmar <karl@utopiafoundation.org>
List: tech-kern
Date: 07/13/2005 16:36:55
I have panic's once in a while in the ohci_softintr(), that occure in
the code when it's about to reverse the done-list (see the backtrace
from the PR below).

open PR:s on the subject:
kern/30331 [serious/medium]: kernel panic somethimes with USB printer
kern/30398 [serious/low]: panic in ohci_softintr

(the first PR sent by me.)

I search around for information what could cause the problems with my
USB host.

Where I stand:

* I have looked through the FreeBSD code and their CVS history for any
     changes related to this problem, no success.
* I have looked in the NetBSD CVS history for relevant changes, but I
    can't seen anything obvious.
* I have tried the latest ohci.c (from HEAD) but the with same result.
* I have looked at the Linux source to see what they do in similar
     event. If I understand the code right they just print an error
     message to the console and abort the iteration.
http://fxr.watson.org/fxr/source/drivers/usb/host/ohci-q.c?v=linux-2.6.11.8#L866


Questions:

* In ohci_softintr(): When our driver reverse the done list, it call 
ohci_hash_find_td() to get the td (Transfer description?)  at the 
specified physical address. Why isn't ohci_hash_find_td() and 
ohci_hash_find_idt() protected with splusb()?

* If we simply abort the done traverse when we detect this condition 
(that done is an invalid itd or td). What information needs to be 
checked to retain a known state again. I know that can be a risky 
business...

* I read some ideas that these conditions could occur because of buggy 
DMA, is this likely?

Regards,

Karl