kern/49711: Spurious interrupts with Intel E7520 chip set can cause hangs

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/49711: Spurious interrupts with Intel E7520 chip set can cause hangs
From: tih%hamartun.priv.no@localhost
Date: Mon, 2 Mar 2015 17:10:04 +0000 (UTC)
>Number:         49711
>Category:       kern
>Synopsis:       Interrupts get routed to wrong handler under load
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          support
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 02 17:10:00 +0000 2015
>Originator:     Tom Ivar Helbekkmo
>Release:        NetBSD 7.99.5
>Organization:
>Environment:
	
	
System: NetBSD barsoom.hamartun.priv.no 7.99.5 NetBSD 7.99.5 (BARSOOM) #128: Wed Feb 25 16:32:01 CET 2015 root%barsoom.hamartun.priv.no@localhost:/usr/obj/sys/arch/amd64/compile.amd64/BARSOOM amd64
Architecture: x86_64
Machine: amd64
>Description:

What happens seems to be that the Intel E7520 chip set has a bug where
an interrupt is being handled, and the ioapic pin temporarily masked,
and the chip set decides to make the masked interrupt pop up somewhere
else.  This somewhere else depends on the physical configuration of
the motherboard, and on the BIOS: it follows the underlying sharing of
IRQs.

I'm observing the problem on a Dell PowerEdge 2850, but it is known to
happen also on other Dell systems from the same period, as well as on
machines built by other manufacturers using the same chip set.  In my
case, the primary network interface, wm0, shares an underlying IRQ
with one uhci device, while the Dell PERC RAID controller, amr0,
shares with another uhci device.  As the machine gets loaded down with
increasing disk and network I/O, the number of interrupts falsely
routed to the uhci devices increases exponentially.  With a light
load, this will cause hangs of a few seconds at a time, where the
machine does not respond to anything.  With a sufficiently high load,
the machine will hang completely, and need a power cycle.  The problem
is worse with more CPUs active; I've not seen hangs of over half a
minute in single CPU mode.  'vmstat -i' reveals that the machine is
racking up very large numbers of interrupts on the (idle) uhci devices
during the hangs.

Since interrupts can get misrouted while previous interrupts are still
being handled, increasing I/O load increases both the number of
interrupts that could get hit, and the amount of time spent handling
interrupts, and thus being susceptible to the accidents.  This would
explain the non-linear growth of the problem with increasing load.

Googling reveals that the problem is well known, and has been worked
around in Linux and FreeBSD by introducing interrupt storm
mitigation.

>How-To-Repeat:

Load down a system built around the E7520 chip set with disk and
network I/O, and observe.

>Fix:

A workaround has been found for the particular case seen with this
Dell hardware, where disk and network interrupts get misrouted to the
uhci driver.  Joerg Sonnenberger suggested a way of modifying the uhci
driver to a) not enable interrupts from the uhci devices, and b) run
in a quasi-polled mode, using callout_schedule() to make the driver
check each uhci device HZ times per second.  This permits the uhci
driver to have a minimal execution path for its interrupt handler,
which simply returns without doing anything.

I am running with this hack in place, and have found myself unable to
get the machine to hang.  I do see periods of high interrupt rates on
the uhci ioapic lines, and the effect can be seen in increased ICMP
ECHO round trip times between this system and neighbors on the local
network (increasing from sub-millisecond to 15 to 20 milliseconds
during very heavily loaded periods), but it is a good workaround, and
running serial communications over a USB dongle over an affected uchi
device works flawlessly at the same time.

Here is the patch (replace "BARSOOM" with whatever option string or
other conditional you would like it to depend on):

Index: sys/dev/usb/uhci.c
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/uhci.c,v
retrieving revision 1.264
diff -u -r1.264 uhci.c
--- sys/dev/usb/uhci.c	5 Aug 2014 06:35:24 -0000	1.264
+++ sys/dev/usb/uhci.c	2 Mar 2015 12:47:25 -0000
@@ -385,6 +385,19 @@
 	UHCICMD(sc, 0);			/* do nothing */
 }
 
+#ifdef BARSOOM
+Static int uhci_intr1(uhci_softc_t *);
+
+static void uhci_intr1_wrap(void *cookie)
+{
+	uhci_softc_t *sc = cookie;
+
+	mutex_spin_enter(&sc->sc_intr_lock);
+	uhci_intr1(sc);
+	mutex_spin_exit(&sc->sc_intr_lock);
+}
+#endif
+
 usbd_status
 uhci_init(uhci_softc_t *sc)
 {
@@ -542,8 +555,16 @@
 	DPRINTFN(1,("uhci_init: enabling\n"));
 
 	err =  uhci_run(sc, 1, 0);		/* and here we go... */
+
+#ifdef BARSOOM
+	callout_init(&sc->sc_worker, CALLOUT_MPSAFE);
+	callout_setfunc(&sc->sc_worker, uhci_intr1_wrap, sc);
+	callout_schedule(&sc->sc_worker, 1);
+#else
 	UWRITE2(sc, UHCI_INTR, UHCI_INTR_TOCRCIE | UHCI_INTR_RIE |
 		UHCI_INTR_IOCE | UHCI_INTR_SPIE);	/* enable interrupts */
+#endif
+
 	return err;
 }
 
@@ -720,8 +741,10 @@
 	UHCICMD(sc, cmd | UHCI_CMD_FGR); /* force resume */
 	usb_delay_ms_locked(&sc->sc_bus, USB_RESUME_DELAY, &sc->sc_intr_lock);
 	UHCICMD(sc, cmd & ~UHCI_CMD_EGSM); /* back to normal */
+#ifndef BARSOOM
 	UWRITE2(sc, UHCI_INTR, UHCI_INTR_TOCRCIE |
 	    UHCI_INTR_RIE | UHCI_INTR_IOCE | UHCI_INTR_SPIE);
+#endif
 	UHCICMD(sc, UHCI_CMD_MAXP);
 	uhci_run(sc, 1, 1); /* and start traffic again */
 	usb_delay_ms_locked(&sc->sc_bus, USB_RESUME_RECOVERY, &sc->sc_intr_lock);
@@ -735,6 +758,11 @@
 #endif
 
 	sc->sc_suspend = PWR_RESUME;
+
+#ifdef BARSOOM
+	callout_schedule(&sc->sc_worker, 1);
+#endif
+
 	mutex_spin_exit(&sc->sc_intr_lock);
 
 	return true;
@@ -1270,14 +1298,21 @@
 		sc->sc_bulk_end = pqh;
 }
 
+#ifndef BARSOOM
 Static int uhci_intr1(uhci_softc_t *);
+#endif
 
 int
 uhci_intr(void *arg)
 {
-	uhci_softc_t *sc = arg;
 	int ret = 0;
 
+#ifndef BARSOOM
+	uhci_softc_t *sc = arg;
+
+	if (sc == NULL)
+		return 0;
+
 	mutex_spin_enter(&sc->sc_intr_lock);
 
 	if (sc->sc_dying || !device_has_power(sc->sc_dev))
@@ -1294,6 +1329,8 @@
 
  done:
 	mutex_spin_exit(&sc->sc_intr_lock);
+#endif /* BARSOOM */
+
 	return ret;
 }
 
@@ -1313,8 +1350,13 @@
 	KASSERT(mutex_owned(&sc->sc_intr_lock));
 
 	status = UREAD2(sc, UHCI_STS) & UHCI_STS_ALLINTRS;
-	if (status == 0)	/* The interrupt was not for us. */
+
+	if (status == 0) {	/* The interrupt was not for us. */
+#ifdef BARSOOM
+		callout_schedule(&sc->sc_worker, 1);
+#endif
 		return (0);
+	}
 
 	if (sc->sc_suspend != PWR_RESUME) {
 #ifdef DIAGNOSTIC
@@ -1322,6 +1364,9 @@
 		       device_xname(sc->sc_dev));
 #endif
 		UWRITE2(sc, UHCI_STS, status); /* acknowledge the ints */
+#ifdef BARSOOM
+		callout_schedule(&sc->sc_worker, 1);
+#endif
 		return (0);
 	}
 
@@ -1359,8 +1404,13 @@
 		sc->sc_dying = 1;
 	}
 
-	if (!ack)
+	if (!ack) {
+#ifdef BARSOOM
+		callout_schedule(&sc->sc_worker, 1);
+#endif
 		return (0);	/* nothing to acknowledge */
+	}
+
 	UWRITE2(sc, UHCI_STS, ack); /* acknowledge the ints */
 
 	sc->sc_bus.no_intrs++;
@@ -1368,6 +1418,9 @@
 
 	DPRINTFN(15, ("%s: uhci_intr: exit\n", device_xname(sc->sc_dev)));
 
+#ifdef BARSOOM
+	callout_schedule(&sc->sc_worker, 1);
+#endif
 	return (1);
 }
 
Index: sys/dev/usb/uhcivar.h
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/uhcivar.h,v
retrieving revision 1.52
diff -u -r1.52 uhcivar.h
--- sys/dev/usb/uhcivar.h	29 Jan 2013 00:00:15 -0000	1.52
+++ sys/dev/usb/uhcivar.h	2 Mar 2015 12:47:25 -0000
@@ -137,6 +137,10 @@
 	bus_space_handle_t ioh;
 	bus_size_t sc_size;
 
+#ifdef BARSOOM
+	struct callout sc_worker;
+#endif
+
 	kmutex_t sc_lock;
 	kmutex_t sc_intr_lock;
 	kcondvar_t sc_softwake_cv;

>Unformatted:
Prev by Date: PR/49708 CVS commit: src/usr.sbin/makemandb
Next by Date: Re: kern/49121 (Starting X on a Lenovo T400 under NetBSD-7 beta panics system -- DRM)
Previous by Thread: PR/49708 CVS commit: src/usr.sbin/makemandb
Next by Thread: PR/49195 CVS commit: src/sys/external/bsd/drm2/dist/drm/i915
Indexes:
Home | Main Index | Thread Index | Old Index