Subject: kern/10662: USB host controller halts under load
To: None <firstname.lastname@example.org>
From: IWAMOTO Toshihiro <email@example.com>
Date: 07/23/2000 08:58:14
>Synopsis: USB host controller halts under load
>Arrival-Date: Sun Jul 23 08:59:00 PDT 2000
>Originator: IWAMOTO Toshihiro
>Release: 3 weeks ago -current
System: NetBSD kiku.my.domain 1.5B NetBSD 1.5B (KIKU) #62: Sun Jul 23 22:24:03 JST 2000 firstname.lastname@example.org:/usr/src/syssrc/sys/arch/i386/compile/KIKU i386
This machine's chipset is i810.
When a machine is under heavy system load and transmitting data
with a uftdi(4) serial port, the USB host controller's consistency
check interrupt is triggered and it halts. This problem was
initially discovered when running rsync processes over a serial ppp
Jul 23 20:56:10 kiku /netbsd: uhci0: host controller process error
Jul 23 20:56:12 kiku /netbsd: uhci0: host controller halted
It seems that the cause of this inconsistency is the implementation
of uhci_device_bulk_done(). It removes the finished QH from the QH
chain and immediately calls uhci_free_std_chain().
uhci_free_std_chain() writes 0x12345678 to td_token to mark
the TD as free. As this is not a valid TD token value,
this can trigger the consistency check.
So the scenario of this inconsistency is:
1) the host controller read the finished QH and start to process
TDs linked to the QH
2) uhci_remove_bulk() removes the finished QH
3) uhci_free_std() writes 0x12345678 to td_token
4) the host controller reads the TD modified in the step 3
(the step 2 and 3 don't touch TD chain)
I'm not very sure that this is the real cause, but placing
DELAY(100) before uhci_free_std_chain() also stops the problem.
This suggests the above scenario is true.
First, place the machine under heavy system load:
$ sh -c "while true; do dd if=/dev/zero of=zero count=256000; rm zero; sleep 2;done" &
Then create a ppp link over a uftdi(4) serial port and transmit
some data over the link:
$ while true; do scp /netbsd 10.9.8.7:/tmp; sleep 4; done
The following diff seems to solve the problem.
It might be necessary to add some DELAY() before calling
uhci_free_std_chain() for some hardware.
cvs diff: Diffing .
RCS file: /export/kiku/NetBSD/NetBSD-CVS/syssrc/sys/dev/usb/uhci.c,v
retrieving revision 1.120
diff -u -r1.120 uhci.c
--- uhci.c 2000/06/01 15:51:26 1.120
+++ uhci.c 2000/07/23 13:23:52
@@ -979,6 +979,7 @@
DPRINTFN(10, ("uhci_remove_bulk: sqh=%p\n", sqh));
+ sqh->qh.qh_elink = htole32(UHCI_PTR_T);
pqh = uhci_find_prev_qh(sc->sc_bulk_start, sqh);
pqh->hlink = sqh->hlink;
pqh->qh.qh_hlink = sqh->qh.qh_hlink;