Subject: Re: Still ehci+umass trouble...
To: None <current-users@netbsd.org>
From: Martin S. Weber <Ephaeton@gmx.net>
List: current-users
Date: 01/23/2006 16:30:36
On Mon, Jan 23, 2006 at 01:49:39PM +0100, Juan RP wrote:
> On Mon, 23 Jan 2006 10:26:56 +0100
> "Martin S. Weber" <Ephaeton@gmx.net> wrote:
> 
> > On Sun, Jan 22, 2006 at 04:15:02PM +0100, Martin S. Weber wrote:
> > > (...)
> > > 1. I have these (exact same) problems with an Intel EHCI too:
> > > 
> > > NetBSD rfhinf038 3.99.11 NetBSD 3.99.11 (GENERIC) 
> > >  #0: Sat Nov 19 17:13:02 CET 2005  
> > >  root@rfhinf038:/usr/home/netbsd/obj/sys/arch/i386/compile/GENERIC
> > > i386 (...)
> > 
> > Updated.
> > 
> > NetBSD rfhinf038 3.99.15 NetBSD 3.99.15 (GENERIC.MP_USBDEBUG) 
> >  #0: Sun Jan 22 17:52:54 CET 2006  
> >  root@circe.entropie.net:/src/obj/sys/arch/i386/compile/GENERIC.MP_USBDEBUG
> > i386
> > 
> > ehci0 at pci0 dev 29 function 7: Intel 82801EB/ER USB EHCI Controller
> > (rev. 0x02) ehci0: interrupting at ioapic0 pin 23 (irq 5)
> > ehci0: BIOS has given up ownership
> > ehci0: EHCI version 1.0
> > ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
> > usb4 at ehci0: USB revision 2.0
> 
> The code we added from OpenBSD only tries to fix this problem on VIA
> EHCI controllers, your controller is an Intel ICH.
> 

Uh yes; I had the umass lying next to it (the machine with the intel ehci)
 and I claimed that this problem also exists for intel controllers; 
thus I tested the latest kernel to back up my claim; 
this was the accompagnying data (and more detailled still follows).

> FreeBSD uses this code on VIA and ATI EHCI controllers to workaround
> umass stalls.
> 
> Maybe you could try to match your EHCI controller and see if that
> workarounds your problem.
> 
> Something like:
> 
>         /* Enable workaround for dropped interrupts as required */
>         if (sc->sc.sc_id_vendor == PCI_VENDOR_VIATECH)
>                 sc->sc.sc_flags |= EHCIF_DROPPED_INTR_WORKAROUND;
> 
> To
> 
> 	if (sc->sc.sc_id.vendor == PCI_VENDOR_VIATECH ||
> 	    sc->sc.sc_id.vendor == PCI_VENDOR_INTEL)
> 		sc->sc.sc_flags |= EHCIF_DROPPED_INTR_WORKAROUND;
> 

I did that exact thing, and I still got along in producing an umass stall. Yet as
you might guess this time the output is different (especially after I was told
about umassdebug & ehcidebug)....

Ok, for completeness (still waiting for my VIA test until PR'ing), this is the
current state of anarchy:

[*] := while :; do
for i in */*/*; do [ $(( $RANDOM % 5 )) -eq 3 ] || continue; cat "$i" >| ~/junk_x; done
sleep $(( $RANDOM % 30 ))
done

Talking about:

EHCI:

ehci0 at pci0 dev 29 function 7: Intel 82801EB/ER USB EHCI Controller (rev. 0x02)
ehci0: offs=32
ehci0: interrupting at ioapic0 pin 23 (irq 5)
ehci_pci_attach: companion uhci0
ehci_pci_attach: companion uhci1
ehci_pci_attach: companion uhci2
ehci_pci_attach: companion uhci3
ehci_dump_caps: legsup=0x00000001 legctlsts=0xc0000000
ehci0: BIOS has given up ownership
ehci_init: start
ehci0: EHCI version 1.0
ehci_init: sparams=0x104208
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
ehci_init: cparams=0x6871
ehci0: resetting
ehci0: flsize=1024
QH(0xcaeafe40) at 0x01914e40:
  link=0x01914e42<QH>
  endp=0x0000a000
    addr=0x00 inact=0 endpt=0 eps=2 dtc=0 hrecl=1
    mpl=0x0 ctl=0 nrl=0
  endphub=0x00000000
    smask=0x00 cmask=0x00 huba=0x00 port=0 mult=0
  curqtd=0x00000001<T>
Overlay qTD:
  next=0x00000001<T> altnext=0x00000001<T>
  status=0x00000040: toggle=0 bytes=0x0 ioc=0 c_page=0x0
    cerr=0 pid=0 stat=0x40<HALTED>
  buffer[0]=0x00000000
  buffer[1]=0x00000000
  buffer[2]=0x00000000
  buffer[3]=0x00000000
  buffer[4]=0x00000000
usb4 at ehci0: USB revision 2.0


UMASS:
umass0 at uhub5 port 3 configuration 1 interface 0
umass0: Genesys Logic USB TO IDE, rev 2.00/0.33, addr 3
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, 1 lun per target
sd0 at scsibus0 target 0 lun 0: <IC25N080, ATMR04-0, 0811> disk fixed
sd0: fabricating a geometry
sd0: 76319 MB, 76319 cyl, 64 head, 32 sec, 512 bytes/sect x 156301488 sectors
sd0: fabricating a geometry

Just a -current kernel with umassdebug / ehcidebug:

(mount)
ehci_alloc_sqtd_chain: start len=65536
....
(start writing [rsync])
ehci_alloc_sqtd_chain: start len=65536
....
(start reading, too [*] * 1)
umass0: BBB reset failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178da80 status=0x8c008148
umass0: BBB bulk-in clear stall failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178d080 status=0x8c00
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178da80 status=0x8c008148
umass0: BBB bulk-in clear stall failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178d080 status=0x1f8049
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178da80 status=0x8c008148
umass0: BBB bulk-in clear stall failed, IOERROR
ehci_device_clear_toggle: epipe=0xc178d080 status=0x1f8049
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
...


With EHCIF_DROPPED_INTR_WORKAROUND activated for PCI_VENDOR_INTEL in ehci_pci.c:

(mount, start writing [rsync])
(also have reader [*] * 2, 3 at times run)
(all fine so far. Ok Seems stressable. Stop those [*]s.)
ehci_intrlist_timeout
ehci_alloc_sqtd_chain: start len=65536

(and here xmms tries to read the "next" mp3 from the umass)

ehci_alloc_sqtd_chain: start len=65536
ehci_intrlist_timeout
...(about 50 follow)...
ehci_intrlist_timeout
ehci_timeout: exfer=0xc1654900
ehci_timeout_task: xfer=0xc1654900
ehci_abort_xfer: xfer=0xc1654900 pipe=0xc1660000
ehci_intr1: door bell
ehci_intrlist_timeout
...(more follow)...
ehci_intrlist_timeout
ehci_timeout: exfer=0xc168a200
ehci_intrlist_timeout
ehci_timeout_task: xfer=0xc168a200
ehci_abort_xfer: xfer=0xc168a200 pipe=0xc1660180
ehci_intr1: door bell
ehci_idone: aborted xfer=0xc168a200
umass0: BBB reset failed, TIMEOUT
ehci_device_clear_toggle: epipe=0xc1660000 status=0x6008d80
...
ehci_intrlist_timeout
ehci_timeout: exfer=0xc168af00
ehci_intrlist_timeout
ehci_timeout_task: xfer=0xc168af00
ehci_abort_xfer: xfer=0xc168af00 pipe=0xc1660180
ehci_intr1: door bell
ehci_idone: aborted xfer=0xc168af00
umass0: BBB bulk-in clear stall failed, TIMEOUT
ehci_device_clear_toggle: epipe=0xc1660380 status=0x8c00
ehci_intrlist_timeout
....
ehci_intrlist_timeout
ehci_timeout: exfer=0xc1654300
ehci_timeout_task: xfer=0xc1654300
ehci_abort_xfer: xfer=0xc1654300 pipe=0xc1660180
ehci_intr1: door bell
umass0: BBB bulk-out clear stall failed, TIMEOUT
ehci_intrlist_timeout


Sigh.  Uhm. More stuff to follow once I'm near the VIA EHCI.

Regards,

-Martin

PS: yes yes yes this all will be summarized into a PR. If you have further
ideas how I could abuse the hardware, tell me :)