Subject: ohci_softintr() panic's because of chipset bug?
To: None <tech-kern@netbsd.org>
From: Karl Janmar <karl@utopiafoundation.org>
List: tech-kern
Date: 07/13/2005 16:36:55
I have panic's once in a while in the ohci_softintr(), that occure in
the code when it's about to reverse the done-list (see the backtrace
from the PR below).
open PR:s on the subject:
kern/30331 [serious/medium]: kernel panic somethimes with USB printer
kern/30398 [serious/low]: panic in ohci_softintr
(the first PR sent by me.)
I search around for information what could cause the problems with my
USB host.
Where I stand:
* I have looked through the FreeBSD code and their CVS history for any
changes related to this problem, no success.
* I have looked in the NetBSD CVS history for relevant changes, but I
can't seen anything obvious.
* I have tried the latest ohci.c (from HEAD) but the with same result.
* I have looked at the Linux source to see what they do in similar
event. If I understand the code right they just print an error
message to the console and abort the iteration.
http://fxr.watson.org/fxr/source/drivers/usb/host/ohci-q.c?v=linux-2.6.11.8#L866
Questions:
* In ohci_softintr(): When our driver reverse the done list, it call
ohci_hash_find_td() to get the td (Transfer description?) at the
specified physical address. Why isn't ohci_hash_find_td() and
ohci_hash_find_idt() protected with splusb()?
* If we simply abort the done traverse when we detect this condition
(that done is an invalid itd or td). What information needs to be
checked to retain a known state again. I know that can be a risky
business...
* I read some ideas that these conditions could occur because of buggy
DMA, is this likely?
Regards,
Karl