Subject: kern/10662: USB host controller halts under load
To: None <gnats-bugs@gnats.netbsd.org>
From: IWAMOTO Toshihiro <iwamoto@sat.t.u-tokyo.ac.jp>
List: netbsd-bugs
Date: 07/23/2000 08:58:14
>Number:         10662
>Category:       kern
>Synopsis:       USB host controller halts under load
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jul 23 08:59:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     IWAMOTO Toshihiro
>Release:        3 weeks ago -current
>Organization:
	
>Environment:
	
System: NetBSD kiku.my.domain 1.5B NetBSD 1.5B (KIKU) #62: Sun Jul 23 22:24:03 JST 2000 toshii@kiku.my.domain:/usr/src/syssrc/sys/arch/i386/compile/KIKU i386
	This machine's chipset is i810.

>Description:
	When a machine is under heavy system load and transmitting data
	with a uftdi(4) serial port, the USB host controller's consistency
	check interrupt is triggered and it halts. This problem was
	initially discovered when running rsync processes over a serial ppp
	link.

Jul 23 20:56:10 kiku /netbsd: uhci0: host controller process error
Jul 23 20:56:12 kiku /netbsd: uhci0: host controller halted

	It seems that the cause of this inconsistency is the implementation
	of uhci_device_bulk_done(). It removes the finished QH from the QH
	chain and immediately calls uhci_free_std_chain().
	uhci_free_std_chain() writes 0x12345678 to td_token to mark
	the TD as free. As this is not a valid TD token value,
	this can trigger the consistency check.
	So the scenario of this inconsistency is:
	1) the host controller read the finished QH and start to process
	   TDs linked to the QH
	2) uhci_remove_bulk() removes the finished QH
	3) uhci_free_std() writes 0x12345678 to td_token
	4) the host controller reads the TD modified in the step 3
	  (the step 2 and 3 don't touch TD chain)

	I'm not very sure that this is the real cause, but placing
	DELAY(100) before uhci_free_std_chain() also stops the problem.
	This suggests the above scenario is true.

>How-To-Repeat:
	First, place the machine under heavy system load:
	$ sh -c "while true; do dd if=/dev/zero of=zero count=256000; rm zero; sleep 2;done" &
	Then create a ppp link over a uftdi(4) serial port and transmit
	some data over the link:
	$ while true; do scp /netbsd 10.9.8.7:/tmp; sleep 4; done
>Fix:
	The following diff seems to solve the problem.
	It might be necessary to add some DELAY() before calling
	uhci_free_std_chain() for some hardware.
cvs diff: Diffing .
Index: uhci.c
===================================================================
RCS file: /export/kiku/NetBSD/NetBSD-CVS/syssrc/sys/dev/usb/uhci.c,v
retrieving revision 1.120
diff -u -r1.120 uhci.c
--- uhci.c      2000/06/01 15:51:26     1.120
+++ uhci.c      2000/07/23 13:23:52
@@ -979,6 +979,7 @@
        SPLUSBCHECK;

        DPRINTFN(10, ("uhci_remove_bulk: sqh=%p\n", sqh));
+       sqh->qh.qh_elink = htole32(UHCI_PTR_T);
        pqh = uhci_find_prev_qh(sc->sc_bulk_start, sqh);
        pqh->hlink       = sqh->hlink;
        pqh->qh.qh_hlink = sqh->qh.qh_hlink;
>Release-Note:
>Audit-Trail:
>Unformatted: