Subject: Re: 1.6E stray irq for SCSI controller halts system.
To: None <port-alpha@netbsd.org>
From: Michael L. Hitch <mhitch@netbsd.org>
List: port-alpha
Date: 08/04/2002 11:46:30
On Sat, 3 Aug 2002, Stephen M. Jones wrote:

> So, while doing this I got a couple of stray interrupt complaints and repeately
> after 3 of them, the operating system halted.  I've seen this before on the
> 5305 (this is the API CS20 NetBSD 1.6E (SVERIGE) #0: Tue Jul 30 22:22:12 UTC 2002)
> with a 3COM ethernet controller.  Exactly the same scenario when you'd have stray
> interrupts .. roughly about 10 or 15 .. the system would just lock up.

  The "lock up" is due to the way the interrupt handling is implemented on
the alpha.  When none of the driver interrupt routines specified for a
given interrupt level acknowledges that the driver has processed the
interrupt, the alpha interrupt handler considers it a stray interrupt and
increments a counter.  After a certain number of stray interrupts, it
should print out a message about "stopped logging".  What that message
really means is that the interrupt level has been disabled and no more
interrupts will occur.  The 3COM elinkxl driver had problems with that (I
think that has been fixed in -current and 1.6).  I've been able to clear
this condition by getting into the debugger and calling the routine to
enable the interrupt (I can't remember the exact procedure at the moment).

  It seems to me that the interrupt disabling on stray interrupts should
only occur when the stray interrupts are continuous or occuring quite
frequently.  Something like this could perhaps be done when when a driver
indicates that the driver acknowledged the interrupt, although that would
add a little more overhead to the interrupt processing.  Another
possibility would be to have a periodic timer go through and clear all the
stray counters.

--
Michael L. Hitch			mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University	Bozeman, MT	USA