NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-amd64/42626: interrupt storm if booting with mulitple CPUs



>Number:         42626
>Category:       port-amd64
>Synopsis:       interrupt storm if booting with mulitple CPUs
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    port-amd64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 16 13:35:00 +0000 2010
>Originator:     Martin Husemann
>Release:        NetBSD 5.99.23
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD martins.aprisoft.de 5.99.23 NetBSD 5.99.23 (MARTINS) #82: Sat 
Jan 16 13:54:41 CET 2010 
martin%martins.aprisoft.de@localhost:/usr/src/sys/arch/amd64/compile/MARTINS 
amd64
Architecture: x86_64
Machine: amd64
>Description:

Since early december 2008 I can't boot this machine with multiple CPUs enabled.
Booting with -1 makes it work. This has been previously tracked as
PR #40159, but split out into several different issues, leaving this one.

When I boot without -1, the machine creates an interrupt storm (on the
order of ~100000 interrupts/s).

When I break into ddb, the stack trace always goes through Xioapic_intr_edge7
plus an offset (depending on kernel version) in the range of 0xc0 - 0xff.

The interrupt handlers served by this are gem @ pci, and pciide_common
(from viaide @ pci) - none of which is edge triggered and both devices not
used at all.

I think (but am not realy sure) that nfe1 (via ioapic3) also is involved,
but testing is fragile and I am not realy sure.

Full dmesg and ioapic_dump() output below - note there is no mention of
edge triggered interrupts anywhere.

kernel text is mapped with 2 large pages and 283 normal pages
Loaded initial symtab at 0xffffffff805c3420, strtab at 0xffffffff8060f1d0, # 
entries 12873
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.99.23 (MARTINS) #82: Sat Jan 16 13:54:41 CET 2010
        
martin%martins.aprisoft.de@localhost:/usr/src/sys/arch/amd64/compile/MARTINS
total memory = 4094 MB
avail memory = 3961 MB
mainbus0 (root)
cpu0 at mainbus0 apid 0: AMD 686-class, 2210MHz, id 0xf5a
cpu0: WARNING: errata present, BIOS upgrade may be
cpu0: WARNING: necessary to ensure reliable operation
cpu1 at mainbus0 apid 1: AMD 686-class, 2210MHz, id 0xf5a
ioapic0 at mainbus0 apid 2
ioapic1 at mainbus0 apid 3
ioapic2 at mainbus0 apid 4
ioapic3 at mainbus0 apid 5
acpi0 at mainbus0: Intel ACPICA 20090730
acpibut0 at acpi0 (PWRB, PNP0C0C): ACPI Power Button
attimer0 at acpi0 (PIT0, PNP0100): io 0x40-0x43 irq 0
pcppi0 at acpi0 (SPK0, PNP0800): io 0x61
sysbeep0 at pcppi0
com0 at acpi0 (COM1, PNP0501-1): io 0x3f8-0x3ff irq 4
com: ns16550a, working fifo
com0: console
FDC (PNP0700) at acpi0 not configured
pckbc0 at acpi0 (PS2K, PNP0303) (kbd port): io 0x60,0x64 irq 1
pckbc1 at acpi0 (PS2M, PNP0F13) (aux port): irq 12
NVRB (_NVRAIDBUS) at acpi0 not configured
attimer0: attached to pcppi0
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 mux 1
pci0 at mainbus0 bus 0: configuration mode 1
vendor 0x10de product 0x005e (miscellaneous memory, revision 0xa3) at pci0 dev 
0 function 0 not configured
pcib0 at pci0 dev 1 function 0: vendor 0x10de product 0x0051 (rev. 0xa3)
vendor 0x10de product 0x0052 (SMBus serial bus, revision 0xa2) at pci0 dev 1 
function 1 not configured
ohci0 at pci0 dev 2 function 0: vendor 0x10de product 0x005a (rev. 0xa2)
ohci0: interrupting at ioapic0 pin 20
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
ehci0 at pci0 dev 2 function 1: vendor 0x10de product 0x005b (rev. 0xa3)
ehci0: interrupting at ioapic0 pin 21
ehci0: BIOS refuses to give up ownership, using force
ehci0: companion controller, 4 ports each: ohci0
usb1 at ehci0: USB revision 2.0
auich0 at pci0 dev 4 function 0: nForce4 AC-97 Audio
auich0: interrupting at ioapic0 pin 22
auich0: ac97: Analog Devices AD1981B codec; headphone, 20 bit DAC, no 3D stereo
auich0: ac97: ext id 0x605<AC97_22,AMAP,SPDIF,VRA>
viaide0 at pci0 dev 6 function 0: NVIDIA nForce4 IDE Controller (rev. 0xf2)
viaide0: couldn't map native-PCI interrupt
viaide0: couldn't map native-PCI interrupt
viaide1 at pci0 dev 7 function 0: NVIDIA nForce4 Serial ATA Controller (rev. 
0xf3)
viaide1: using ioapic0 pin 23 for native-PCI interrupt
atabus0 at viaide1 channel 0
atabus1 at viaide1 channel 1
viaide2 at pci0 dev 8 function 0: NVIDIA nForce4 Serial ATA Controller (rev. 
0xf3)
viaide2: using ioapic0 pin 20 for native-PCI interrupt
atabus2 at viaide2 channel 0
atabus3 at viaide2 channel 1
ppb0 at pci0 dev 9 function 0: vendor 0x10de product 0x005c (rev. 0xa2)
pci1 at ppb0 bus 1
nfe0 at pci0 dev 10 function 0: vendor 0x10de product 0x0057 (rev. 0xa3)
nfe0: interrupting at ioapic0 pin 21
nfe0: Ethernet address 00:e0:81:54:9d:e8
makphy0 at nfe0 phy 1: Marvell 88E1111 Gigabit PHY, rev. 1
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
ppb1 at pci0 dev 14 function 0: vendor 0x10de product 0x005d (rev. 0xa3)
pci2 at ppb1 bus 2
vga0 at pci2 dev 0 function 0: vendor 0x1002 product 0x5e4b (rev. 0x00)
wsdisplay0 at vga0 kbdmux 1
drm at vga0 not configured
vendor 0x1002 product 0x5e6b (miscellaneous display) at pci2 dev 0 function 1 
not configured
pchb0 at pci0 dev 24 function 0: vendor 0x1022 product 0x1100 (rev. 0x00)
pchb1 at pci0 dev 24 function 1: vendor 0x1022 product 0x1101 (rev. 0x00)
pchb2 at pci0 dev 24 function 2: vendor 0x1022 product 0x1102 (rev. 0x00)
pchb3 at pci0 dev 24 function 3: vendor 0x1022 product 0x1103 (rev. 0x00)
pchb4 at pci0 dev 25 function 0: vendor 0x1022 product 0x1100 (rev. 0x00)
pchb5 at pci0 dev 25 function 1: vendor 0x1022 product 0x1101 (rev. 0x00)
pchb6 at pci0 dev 25 function 2: vendor 0x1022 product 0x1102 (rev. 0x00)
pchb7 at pci0 dev 25 function 3: vendor 0x1022 product 0x1103 (rev. 0x00)
isa0 at pcib0
smsc0 at isa0 port 0x2e-0x2f: SMSC LPC47B397 Super I/O (rev 1)
smsc0: Hardware Monitor registers at 0x0480
pci3 at mainbus0 bus 17
pci4 at mainbus0 bus 18
gem0 at pci4 dev 4 function 0: vendor 0x108e product 0x2bad (rev. 0x01)
gem0: interrupting at ioapic2 pin 0
gem0: using external PCS SERDES: 1000baseSX-FDX, 1000baseSX-HDX, auto
gem0: Ethernet address 00:03:ba:1c:af:40, 20KB RX fifo, 9KB TX fifo
pci5 at mainbus0 bus 128
vendor 0x10de product 0x005e (miscellaneous memory, revision 0xa3) at pci5 dev 
0 function 0 not configured
vendor 0x10de product 0x00d3 (miscellaneous memory, revision 0xa3) at pci5 dev 
1 function 0 not configured
nfe1 at pci5 dev 10 function 0: vendor 0x10de product 0x0057 (rev. 0xa3)
nfe1: interrupting at ioapic3 pin 20
nfe1: Ethernet address 00:e0:81:54:9d:e9
makphy1 at nfe1 phy 1: Marvell 88E1111 Gigabit PHY, rev. 1
makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
ppb2 at pci5 dev 14 function 0: vendor 0x10de product 0x005d (rev. 0xa3)
pci6 at ppb2 bus 129
Initializing SSP: 9abc02804a28e7aa 4892821332326f14 4735952fa78b7d66 
f44a7a23e6401113 87103fc9779dfb02 476345ba5cd20f73 ea9ea34c9225d15a 
9899120b490f78a9 
auich0: measured ac97 link rate at 47998 Hz, will use 48000 Hz
audio0 at auich0: full duplex, playback, capture, mmap, independent
uhub0 at usb0: vendor 0x10de OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1 at usb1: vendor 0x10de EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
viaide2 port 1: device present, speed: 1.5Gb/s
viaide1 port 1: device present, speed: 1.5Gb/s
viaide2 port 0: device present, speed: 1.5Gb/s
wd0 at atabus1 drive 0: <WDC WD1600JD-00HBB0>
wd0: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808 sectors
wd1 at atabus2 drive 0: <WDC WD740GD-00FLA2>
wd1: 70911 MB, 144073 cyl, 16 head, 63 sec, 512 bytes/sect x 145226112 sectors
wd2 at atabus3 drive 0: <WDC WD740GD-00FLA2>
wd2: 70911 MB, 144073 cyl, 16 head, 63 sec, 512 bytes/sect x 145226112 sectors
ehci0: handing over full speed device on port 1 to ohci0
ehci0: handing over low speed device on port 3 to ohci0
boot device: wd0
root device (default wd0a): ddb
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80151745 cs 8 rflags 202 cr2  0 cpl 0 rsp 
ffffffff80646760
Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x5:  leave
db{1}> call ioapic_dump
ioapic0: dump1 0x61<vector=0x61,delmode=0x0,dest=0x0> 0x0<target=0x0>
ioapic0: dump4 0x81<vector=0x81,delmode=0x0,dest=0x0> 0x0<target=0x0>
ioapic0: dump9 0xa060<vector=0x60,delmode=0x0,actlo,level,dest=0x0> 
0x0<target=0x0>
ioapic0: dump20 0x18062<vector=0x62,delmode=0x0,level,masked,dest=0x0> 
0x0<target=0x0>
ioapic0: dump21 0x8063<vector=0x63,delmode=0x0,level,dest=0x0> 0x0<target=0x0>
ioapic0: dump22 0x8064<vector=0x64,delmode=0x0,level,dest=0x0> 0x0<target=0x0>
ioapic0: dump23 0x8065<vector=0x65,delmode=0x0,level,dest=0x0> 0x0<target=0x0>
ioapic2: dump0 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0> 
0x0<target=0x0>
ioapic3: dump20 0x8067<vector=0x67,delmode=0x0,level,dest=0x0> 0x0<target=0x0>
0xffff800007e50c00
db{1}> call ioapic_dump_raw
Register dump of ioapic0
00 02000000 00170011 02000000 00000000 00000000 00000000 00000000 00000000
08 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10 00010000 00000000 00000061 00000000 00010000 00000000 00010000 00000000
18 00000081 00000000 00010000 00000000 00010000 00000000 00010000 00000000
20 00010000 00000000 0000a060 00000000 00010000 00000000 00010000 00000000
28 00010000 00000000 00010000 00000000 00010000 00000000 00010000 00000000
30 0001a000 00000000 0001a000 00000000 0001a000 00000000 0001a000 00000000
38 00018062 00000000 00008063 00000000 00008064 00000000 00008065 00000000
Register dump of ioapic1
00 03000000 00030011 00000000 00000000 00000000 00000000 00000000 00000000
08 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10 00010000 00000000 00010000 00000000 00010000 00000000 00010000 00000000
18 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
20 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
28 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
30 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
38 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Register dump of ioapic2
00 04000000 00030011 00000000 00000000 00000000 00000000 00000000 00000000
08 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10 0000a066 00000000 00010000 00000000 00010000 00000000 00010000 00000000
18 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
20 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
28 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
30 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
38 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Register dump of ioapic3
00 05000000 00170011 05000000 00000000 00000000 00000000 00000000 00000000
08 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10 00010000 00000000 00010000 00000000 00010000 00000000 00010000 00000000
18 00010000 00000000 00010000 00000000 00010000 00000000 00010000 00000000
20 00010000 00000000 00010000 00000000 00010000 00000000 00010000 00000000
28 00010000 00000000 00010000 00000000 00010000 00000000 00010000 00000000
30 0001a000 00000000 0001a000 00000000 0001a000 00000000 0001a000 00000000
38 00008067 00000000 0001a000 00000000 0001a000 00000000 0001a000 00000000


>How-To-Repeat:
No idea, just try to boot GENERIC on my machine does it ;-)

>Fix:
any hints welcome



Home | Main Index | Thread Index | Old Index