Subject: Re: Fix/workarond for USB ** host controller halted ** errors?
To: None <hselasky@c2i.net>
From: Rafal Boni <rafal@pobox.com>
List: tech-kern
Date: 10/05/2005 12:16:11
In message <200510051259.40491.hselasky@c2i.net>, you write: 

-> On Wednesday 05 October 2005 01:52, davef1624@aol.com wrote:
-> > On a heavily loaded i386 (dual-Xeon) based system,
-> > using Intel's ICH3 I/O controller chip (which includes a 1.1 USB HC),
-> > I'm seeing the following uhci errors:
-> >
-> > kernel: uhci1: host controller process error
-> > kernel: uhci1: host controller halted
-> >
-> > This occurs after several hours of load on the system.
-> > Apparently, the USB host controller complains about an inconsistency
-> > when processing
-> > one of the TD's in its Frame List.
-> >
-> > Is there any workaround and/or fix for this issue?
-> > I don't want to reboot to solve this.
-> >
-> > In reviewing the uhci code, there could be a race-condition when
-> > adding/removing Queue Heads
-> > between the HCD and the HC;
-> > specifically, the T-bit needs to be set in the elink field of the QH so
-> > that the HC
-> > doesn't follow the pointer.
-> > However, after setting the T-bit, there is a call to
-> > delay(UHCI_QH_REMOVE_DELAY)
-> > to give the HC time to stop looking at the TD.
-> 
-> The UHCI driver must remove all QH's from the schedule before touching 
-> anything. As long as a QH is in the schedule, it is owned by the USB  
-> controller! That I think is the reason for the problem, which also happens
-> to be the case with the EHCI driver.

Yes, in fact I see this a bit on an EHCI controller on my sparc64 (a PCI
card with a Via VT8237 EHCI in an Ultra5).  Besides this issue, I also 
have problems copying large amounts of data to my USB 2.0 CF fob; I get
errors like:

	umass0: Invalid CSW: tag <N> should be <N+1>
	umass0: BBB reset failed, STALLED

These errors either lead to msdosfs diag asserts, the copy hanging, or
the above 'ehci0: unrecoverable error, controller halted' followed by
'ehci0: blocking intrs 0x10' and a reboot being required to get USB2
devices working again.  This has happened both with a Sony Microvault
CF fob and with my iRiver MP3 player in mass-storage mode.

The umass problems seem to either be USB2/EHCI specific or are just much
harder to trip when using the device plugged into a USB1 hub due to slower
timing.  

--rafal

----
Rafal Boni                                                     rafal@pobox.com
  We are all worms.  But I do believe I am a glowworm.  -- Winston Churchill