Subject: Re: isp(4) with Q-Logic 2340 suffers "stray interrupts" under load
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: port-alpha
Date: 03/28/2005 17:51:26
--J4XPiPrVK1ev6Sgr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Mar 28, 2005 at 06:48:24PM -0500, Greg A. Woods wrote:
> [ On Monday, March 28, 2005 at 13:44:05 (+0200), Manuel Bouyer wrote: ]
> > Subject: Re: isp(4) with Q-Logic 2340 suffers "stray interrupts" under =
load
> >
> > The com(4) and the isp(4) interrupt routine contain a loop, which will =
check
> > if another interrupt was raised while handling this one. I think there =
is a
> > race condition that can cause the interrupt controller to have a pending
> > interrupt which has already been handled by the loop in the interrupt h=
andler.
> > This is harmless and probably can't be avoided (either we have pending
> > interrupts which have already been handled, or or may miss real interru=
pts),
> > but will cause the interrupt controller driver to think it's a spurious
> > interrupt.
>=20
> I think I'm even more confused now than before!  ;-)

Oops!

> How can there be a race condition where the interrupt controller driver
> sees a pending interrupt that's already been handled?

I don't know the hardware, but consider the case of edge-sensitive=20
interrupts. The interrupt state is noted by the controller when the device=
=20
asserts the interrupt (either raises or lowers the voltage). Then the=20
device releases the interrupt line.

I appoligize if the hardware in question doesn't use edge-sensitive.

Consider the case:

1) Device asserts interrupt. Edge goes by and gets noted.
2) Interrupt handling begins, and interrupt is cleared from interrupt=20
handler chip.
3) Before interrupt handling routine finishes, another interrupt is=20
asserted. Edge goes by and gets noted by interrupt handling chip.
4) Interrupt routine satisfies ALL interrupts, both from step 1 and step=20
3. Then returns.
5) Pending interrupt on chip gets handled, but device is already=20
satisfied, so the interrupt looks spurrious.

> Is this because the former (the interrupt controller driver) might be
> woken up on another CPU while the com(4) driver's interrupt service
> routine is still running on a CPU?
>=20
> Can this even happen on uniprocessor systems if servicing the
> device-specific interrupt doesn't clear the IRQ line?  I.e. even though,
> for example, the com(4) driver clears the UART interrupt status register
> that the interrupt controller has still been triggered anyway and will
> deliver another interrupt to the interrupt controller driver?  Is it

Yes, I think this may be it. But it's not that the IRQ line hasn't been=20
cleared, I think it's because the interrupt controller still has been=20
triggered.

> possible (and wise) for the interrupt controller driver to check that an
> interrupt really is still pending when it gets invoked?  I.e. will
> clearing a device's ISR de-assert the interrupt in the interrupt
> controller and allow the controller driver to avoid searching for
> handlers for non-existant interrupts?

For edge interrupts, I don't think there's a way to de-assert the=20
interrupt in the device-specific handler.

> If either are the case then an interrupt must never be disabled just
> because it happens to generate the occasional "stray" (which in this
> case would be very a misleading description).  I'm not even sure they're
> worth logging at all, unless they come in fast and furious (i.e. they
> come from some active device that's not attached to a driver).  Maybe
> the logging logic can be inverted to keep a timestamp and count and only
> print a log message if N un-handled interrupts are received in X secs.

I'm not sure what to do here. A "stray" interrupt just after we satisfied=
=20
an interrupt is ok, but a "stray" after a "stray" isn't.

> Also, while falling asleep last night I considered the situation with
> shared interrupts and wondered if two devices were sharing an interrupt
> and one of them caused/suffered these "stray" interrupts occasionally,
> then would disabling the interrupt kill both devices?  If so then I
> think that's also reason enough to never disable interrupts due to
> ongoing stray interrupts.

I think that would happen. So yes, disabling is dangerous. The problem=20
though is how do we know if it's happening "occasionally"...

Take care,

Bill

--J4XPiPrVK1ev6Sgr
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFCSLSeWz+3JHUci9cRAoG+AJ4hrLFA8ySoqNdsU8iAanltICukiACfWjmE
k65iKdUcqq1KPXTGzf7KYDc=
=xyPc
-----END PGP SIGNATURE-----

--J4XPiPrVK1ev6Sgr--