Date: Fri, 25 Jan 2008 15:23:21 +0000
From: Steven M. Bellovin <smb%cs.columbia.edu@localhost>
To: current-users%netbsd.org@localhost
Subject: interrupt storm after resume on Thinkpad T61
Any thoughts on the interrupt storm problem?  Here's a current 'top'
from my machine:
load averages:  0.02,  0.05,  0.01                  up 0 days, 11:48   10:17:11
67 processes:  66 sleeping, 1 on CPU
CPU0 states:  0.5% user,  0.0% nice,  0.2% system, 75.3% interrupt, 23.9% idle
CPU1 states:  0.2% user,  0.0% nice,  0.3% system,  0.0% interrupt, 99.5% idle
Memory: 435M Act, 205M Inact, 6976K Wired, 61M Exec, 180M File, 1772M Free
Swap: 4097M Total, 4097M Free
 PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
20896 smb       85    0    19M   95M select/1   0:23  0.29%  0.29% claws-mail
26181 smb       85    0   404K 6164K select/0   0:00  0.11%  0.10% xterm
1266 smb       85    0  1904K   85M select/1   3:30  0.00%  0.00% Xorg
1149 smb       85    0    16M  214M select/1   2:29  0.00%  0.00% firefox-bin
From 'vmstat -i', I see that it's pin 17:
# vmstat -i
interrupt                                     total     rate
softint net/0                                141892        3
softint bio/0                                212461        5
softint bio block/0                             290        0
softint clk/0                               2322719       54
cpu0 timer                                 40316586      949
cpu0 FPU flush IPI                               84        0
cpu0 FPU synch IPI                             5110        0
cpu0 MTRR update IPI                              2        0
cpu0 MSR write IPI                              105        0
global TLB IPI                              1921145       45
cpu0 TLB IPI                                  62927        1
softint net/1                                 30950        0
softint clk/1                                    30        0
cpu1 timer                                 40293969      948
cpu1 FPU flush IPI                              173        0
cpu1 FPU synch IPI                             6665        0
cpu1 MTRR update IPI                             21        0
cpu1 MSR write IPI                              866        0
cpu1 ACPI CPU sleep IPI                           2        0
cpu1 TLB IPI                                  56061        1
ioapic0 pin 9                                286526        6
ioapic0 pin 1                                 10241        0
ioapic0 pin 12                               418841        9
ioapic0 pin 20                                47663        1
ioapic0 pin 22                                   70        0
ioapic0 pin 17                            121743653     2865
ioapic0 pin 19                                    5        0
ioapic0 pin 14                               211279        4
Total                                     208090336     4898
and dmesg shows this:
azalia0: interrupting at ioapic0 pin 17 (irq 11)
wpi0: interrupting at ioapic0 pin 17 (irq 11)
uhci3: interrupting at ioapic0 pin 17 (irq 11)
fwohci0: interrupting at ioapic0 pin 17 (irq 11)
Based on comments at Thinkwiki.org and
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/126369
I suspect a USB interrupt problem.  However, I'm running an even newer BIOS
than the one that is claimed to fix the problem, and I'm still seeing it.
I have not tried moving anything off of IRQ 11.  Might that help?  If
so, what should I try?  A different value?  Auto?
It would be nice if NetBSD could detect the flakey hardware and disable a
device that generates so many interrupts.  That kind of CPU usage will kill
battery lifetime (I'm running estd, and it's at maximum frequency), as
well as heating the machine up and slowing down real applications.
                --Steve Bellovin, http://www.cs.columbia.edu/~smb