Subject: Re: isp(4) with Q-Logic 2340 suffers "stray interrupts" under load
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 03/28/2005 18:48:24
[ On Monday, March 28, 2005 at 13:44:05 (+0200), Manuel Bouyer wrote: ]
> Subject: Re: isp(4) with Q-Logic 2340 suffers "stray interrupts" under load
>
> The com(4) and the isp(4) interrupt routine contain a loop, which will check
> if another interrupt was raised while handling this one. I think there is a
> race condition that can cause the interrupt controller to have a pending
> interrupt which has already been handled by the loop in the interrupt handler.
> This is harmless and probably can't be avoided (either we have pending
> interrupts which have already been handled, or or may miss real interrupts),
> but will cause the interrupt controller driver to think it's a spurious
> interrupt.

I think I'm even more confused now than before!  ;-)

How can there be a race condition where the interrupt controller driver
sees a pending interrupt that's already been handled?

Is this because the former (the interrupt controller driver) might be
woken up on another CPU while the com(4) driver's interrupt service
routine is still running on a CPU?

Can this even happen on uniprocessor systems if servicing the
device-specific interrupt doesn't clear the IRQ line?  I.e. even though,
for example, the com(4) driver clears the UART interrupt status register
that the interrupt controller has still been triggered anyway and will
deliver another interrupt to the interrupt controller driver?  Is it
possible (and wise) for the interrupt controller driver to check that an
interrupt really is still pending when it gets invoked?  I.e. will
clearing a device's ISR de-assert the interrupt in the interrupt
controller and allow the controller driver to avoid searching for
handlers for non-existant interrupts?


If either are the case then an interrupt must never be disabled just
because it happens to generate the occasional "stray" (which in this
case would be very a misleading description).  I'm not even sure they're
worth logging at all, unless they come in fast and furious (i.e. they
come from some active device that's not attached to a driver).  Maybe
the logging logic can be inverted to keep a timestamp and count and only
print a log message if N un-handled interrupts are received in X secs.


Also, while falling asleep last night I considered the situation with
shared interrupts and wondered if two devices were sharing an interrupt
and one of them caused/suffered these "stray" interrupts occasionally,
then would disabling the interrupt kill both devices?  If so then I
think that's also reason enough to never disable interrupts due to
ongoing stray interrupts.


Also, BTW, it seems my idea of resetting the stray counter was maybe a
latent memory of reading the following code for the AS4x00/AS1200
(alpha/pci/pci_kn300.c), and/or the associated commit log:

void
kn300_iointr(arg, vec)
	void *arg;
	unsigned long vec;
{
	struct mcpcia_softc *mcp;
	u_long irq;

	irq = SCB_VECTOIDX(vec - MCPCIA_VEC_PCI);

	if (alpha_shared_intr_dispatch(kn300_pci_intr, irq)) {
		/*
		 * Any claim of an interrupt at this level is a hint to
		 * reset the stray interrupt count- elsewise a slow leak
		 * over time will cause this level to be shutdown.
		 */
		alpha_shared_intr_set_maxstrays(kn300_pci_intr, irq, 25);
		return;
	}



The alpha_shared_intr_set_maxstrays() call comes from:

----------------------------
revision 1.18
date: 2000/02/10 07:45:43;  author: mjacob;  state: Exp;  lines: +4 -5
branches:  1.18.2;
Reset maxstray count if we get a good interrupt for a level.
----------------------------



I guess if there really is some tangible benefit to disabling IRQs that
generate a lot of strays and never get handled then this change could be
propogated to the pci_6600 code (and wherever else it's missing) instead
of just commenting out the disabling code as I've done for now.....


-- 
						Greg A. Woods

H:+1 416 218-0098  W:+1 416 489-5852 x122  VE3TCP  RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>