Interrupt storm mitigation needed

To: tech-kern%netbsd.org@localhost
Subject: Interrupt storm mitigation needed
From: Tom Ivar Helbekkmo <tih%hamartun.priv.no@localhost>
Date: Wed, 11 Feb 2015 19:52:11 +0100

Hello,

I'm running NetBSD-current/amd64 on a Dell PowerEdge 2850, and have been
experiencing mysterious hangs.  With help and guidance from Christos
Zoulas, I belive I've gotten to where I now know what's going on.  What
to do about it is a more difficult question, and Christos suggested we
take the issue to tech-kern.

The behaviour is as follows: when the machine is busy with disk and
network I/O, using the integrated Dell PERC (AMI MegaRAID) "amr" RAID
controller and Intel i8254x "wm" network interfaces, there will be
sudden hangs, where it is completely unresponsive, including not
responding to keypresses on the console, or to ICMP echo packets on the
network.  In single CPU mode, these hangs will last for anything from
just noticeable to almost half a minute at the most.  Reducing the
activity on the machine keeps them from occurring; piling it on again
reintroduces them.  In SMP mode, they will typically be much longer -- I
then tend to get only one or two of, say, ten or twenty seconds, and
then what appears to be a permanent one (although it looks as if I've
had hangs, while I've been away from the machine, which have had it
resume operation after about a half hour).

Christos and I went through a long sequence of tests and modifications,
during which I, among other things, modified the amr driver to use
mutexes and condvars instead of splbio()/splx(), and added a couple of
bug fixes gleaned from FreeBSD.  In the end, however, it turned out that
the problem is interrupt storms from the integrated USB controller.

Here are some interrupt mappings (the devices that are actually in use
are amr0, wm0, and uhci2 (the latter running a 1200 bps serial line over
a ucom device, talking to my UPS, and normally generating about 1200 or
so interrupts per second to do this)):

amr0: interrupting at ioapic1 pin 14
wm0: interrupting at ioapic2 pin 0
wm1: interrupting at ioapic2 pin 1
uhci0: interrupting at ioapic0 pin 16
uhci1: interrupting at ioapic0 pin 19
uhci2: interrupting at ioapic0 pin 18
ehci0: interrupting at ioapic0 pin 23
cmdide0: using ioapic0 pin 23 for native-PCI interrupt
piixide0: primary channel interrupting at ioapic0 pin 14
piixide0: secondary channel interrupting at ioapic0 pin 15
radeon0: interrupting at ioapic0 pin 18 (radeon)

Now, here're some counters during a hang.  It was a three second hang,
and a "vmstat -i 10" that was running jumped the count by about 14000 on
uhci2, and 13000 on uhci0, during the period where the hang was
exhibited.  In the next 10 second interval, they were back to normal.

interrupt                                     total     rate
cpu0 timer                                    89770      100
ioapic1 pin 14                                26260       29
ioapic2 pin 0                                 70060       78
ioapic0 pin 16                                 1558        1
ioapic0 pin 18                                61740       68
ioapic0 pin 23                                20928       23
ioapic0 pin 14                                    6        0
ioapic0 pin 4                                   371        0
Total                                        270693      302

interrupt                                     total     rate
cpu0 timer                                    90771      100
ioapic1 pin 14                                28832       31
ioapic2 pin 0                                 70768       78
ioapic0 pin 16                                14920       16
ioapic0 pin 18                                76075       84
ioapic0 pin 23                                21079       23
ioapic0 pin 14                                    6        0
ioapic0 pin 4                                   377        0
Total                                        302828      334

interrupt                                     total     rate
cpu0 timer                                    91772      100
ioapic1 pin 14                                30682       33
ioapic2 pin 0                                 71440       78
ioapic0 pin 16                                14964       16
ioapic0 pin 18                                77350       84
ioapic0 pin 23                                21313       23
ioapic0 pin 14                                    6        0
ioapic0 pin 4                                   377        0
Total                                        307904      336

It turns out to be a known problem with the particular Intel chip set
Dell used in all its servers at the time this machine was built.  See,
for instance, these references:

https://lists.freebsd.org/pipermail/freebsd-hardware/2005-June/002601.html
http://freebsd.1045724.n5.nabble.com/em-interrupt-storm-td3877379.html

I think what's needed here is interrupt storm mitigation, maybe in a
similar way to what FreeBSD does, in ithread_execute_handlers() in this
source file:

https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_intr.c?view=co

However, I've been unable to figure out where to trap and throttle a
storm in NetBSD.  I had fun barking up the wrong tree when I discovered
the softint subsystem, and instrumented that to (successfully) keep
track of back-to-back invocations of the same handler, but all that
really gave me was the realization that my hunch that hardware interrupt
handling passes through that layer was wrong.  :)

What do people think?  Am I on the right track when I think interrupt
storm mitigation is the way to go?  If so, where would be the right
place to do it?

I'm happy to do the work, but will need some guidance along the way.

-tih
-- 
Popularity is the hallmark of mediocrity.  --Niles Crane, "Frasier"

Follow-Ups:
- Re: Interrupt storm mitigation needed
  - From: Tom Ivar Helbekkmo
- Re: Interrupt storm mitigation needed
  - From: Joerg Sonnenberger

Prev by Date: Re: Making dhcpcd work on diskless clients
Next by Date: Re: Interrupt storm mitigation needed
Previous by Thread: /usr/src/sys/crypto/arc4 isn't used by kernel nad ppp_mppe
Next by Thread: Re: Interrupt storm mitigation needed
Indexes:

Home | Main Index | Thread Index | Old Index