Subject: Re: kern/35553: azalia hangs an Optiplex 745
To: None <gnats-bugs@NetBSD.org>
From: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
List: netbsd-bugs
Date: 08/27/2007 13:55:37
On Fri, 24 Aug 2007, Michael L. Hitch wrote:

> So all these problems seem to come back the the uhci/ehci interaction with
> azalia and bge.  One thing to note is that all three of these device are
> sharing the same interrupt.

   And indeed, the interrupt sharing is causing the hang.  When azalia0 is 
running the codec initialization, interrupts have been enabled and the 
intialization is using interrupts.  Each time azalia0 interrupts, the 
interrupt handlers for azalia0, bge0, and uhci0 are called.  Normall this 
should't cause any undue problems other than extra processing per 
interrupt, but the uhci interrupt handler does not deal intelligently with 
the shared interrupt.  By the time the azalia codec initialization runs, 
something appears to have halted the uhci0 controller, and its status 
register contains UHCI_STS_HCH.  That status doesn't actually indicate an 
interrupt as best as I have been able to determine, but the interrupt 
handler looks at it, outputs a message about the controller halted, and 
disables access to the controller.  After the azalia codec intialization 
is complete, all the USB event process start running.  Each process has to 
complete it's initial discovery task before the autoconfig stuff is 
completed, and the process for uhci0 never completes (presumably because 
access to uhci0 was disabled by the uhci interrupt handler), and the 
system hangs waiting for it to complete.

   When the azalia is disabled, the system will come up because all the USB 
event tasks complete the initial discovery.  However, when bge0 
interrupts, it also causes the uhci interrupt handler to run and detects 
the uhci0 controller halted status.

   I've gotten around this problem by changing the uhci interrupt to not
check for the UHCI_STS_HCH as a valid interrupt bit, and now my system 
boots and runs normally.

   I'm trying to understand how uhci0 (and uhci1 as well) get halted, but 
haven't figured that out yet.

--
Michael L. Hitch			mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University	Bozeman, MT	USA