Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: re(4) 100% interrupt load



Hi

I have seen this condition happen again where on netbsd-4 the interrupt load on CPU0 reached 100% and all network access on re1 was lost. Running tcpdump restores the network access. This time I was able to get a vmstat(1) output while the system was in this state and it is attached below. I also took a vmstat(1) once the network access was restored. The two look very similar with the remarkable thing being the interrupt load on pin 17 (this is mapped to re1). To me this seems quite a high rate but maybe that is normal, as I am no expert. The device is receiving about 64Mbps of UDP data and transmitting very little. Anyone else experienced this? Could this be a network device problem? Driver implementation problem? Any ideas would be appreciated as I am at a loss as to how to go about debugging this. I have seen this on 3 different systems that I have here and in every case re0 seems to be fine while re1 is the device that locks up.

Thanks,
Ian


=====  vmstat when network is *not* working =====

interrupt                                     total     rate
cpu0 softclock                               659928       75
cpu0 softnet                                5974162      686
cpu0 timer                                   852988       98
cpu0 FPU flush IPI                                2        0
cpu0 FPU synch IPI                              349        0
cpu0 TLB shootdown IPI                        62791        7
cpu0 MTRR update IPI                             15        0
cpu1 softnet                                  29187        3
cpu1 timer                                   822884       94
cpu1 FPU flush IPI                                9        0
cpu1 FPU synch IPI                              757        0
cpu1 TLB shootdown IPI                       354623       40
cpu1 MTRR update IPI                              3        0
ioapic0 pin 16                              1576264      181
ioapic0 pin 17                             61671823     7086
ioapic0 pin 19                                21306        2
ioapic0 pin 21                                   44        0
ioapic0 pin 1                                    18        0
Total                                      72027153     8276

=====  vmstat when network is working =====

interrupt                                     total     rate
cpu0 softclock                               719491       76
cpu0 softnet                                8892912      944
cpu0 timer                                   922282       98
cpu0 FPU flush IPI                                2        0
cpu0 FPU synch IPI                              382        0
cpu0 TLB shootdown IPI                        76303        8
cpu0 MTRR update IPI                             15        0
cpu1 softnet                                  41077        4
cpu1 timer                                   889836       94
cpu1 FPU flush IPI                                9        0
cpu1 FPU synch IPI                              816        0
cpu1 TLB shootdown IPI                       445811       47
cpu1 MTRR update IPI                              3        0
ioapic0 pin 16                              2296534      244
ioapic0 pin 17                             67078160     7127
ioapic0 pin 19                                21376        2
ioapic0 pin 21                                   44        0
ioapic0 pin 1                                    18        0
Total                                      81385071     8647



Hi

I have experienced a problem with dual onboard 1Gb re(4) network devices.
Sometimes one of the devices (in the latest instance it was re1) will
'lock-up' and all network function on this device is lost. I am running a
netbsd-4 GENERIC.MP kernel on a dual-core PC (relevant dmesg below). When in this state a top shows the CPU at 0% idle with a 100% interrupt load. If you then run tcpdump(8) on the device everything recovers (I assume because the
device is re-initialised) and the network restores itself. It seems to
happen when using both devices simultaneously with some network load (about
20-40Mbps per device) and happens very seldom (I have seen it twice in 2
weeks of running). Unfortunately I have no vmstat(1) output so I do not know
if there was an interrupt storm happening at the time of the lock up. My
question is, has anybody else seen behaviour such as this?

Thanks,
Ian

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.

NetBSD 4.0_STABLE (GENERIC.MP) #2: Wed Mar 18 21:17:22 SAST 2009
root@netbsdtemplate:/GENERIC.MP
total memory = 503 MB
rbus: rbus_min_start set to 0x40000000
avail memory = 484 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
BIOS32 rev. 0 found at 0xfb400
mainbus0 (root)
ACPI Error (tbxfroot-0775): No valid RSDP was found [20060217]
ACPI Exception (tbxfroot-0531): AE_NOT_FOUND, RSDP structure not found -
Flags=8 [20060217]
ACPI Exception (tbxface-0162): AE_NO_ACPI_TABLES, Could not get the RSDP
[20060217]
ACPI Exception (tbxface-0211): AE_NO_ACPI_TABLES, Could not load tables
[20060217]
ACPI: unable to load tables: AE_NO_ACPI_TABLES
mainbus0: Intel MP Specification (Version 1.4) (OEM00000 PROD00000000)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Core 2 (Merom) (686-class), 1995.35 MHz, id 0x6fd
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 e39d<SSE3,MONITOR,DS-CPL,EST,TM2,xTPR>
cpu0: "Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz"
cpu0: I-cache 32 KB 64B/line 8-way, D-cache 32 KB 64B/line 8-way
cpu0: enabling thermal monitor 1 ... enabled.
cpu0: Enhanced SpeedStep (1244 mV) 2000 MHz
cpu0: unknown Enhanced SpeedStep CPU.
cpu0: calibrating local timer
cpu0: apic clock running at 199 MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: Intel Core 2 (Merom) (686-class), 1995.23 MHz, id 0x6fd
cpu1: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu1: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: features2 e39d<SSE3,MONITOR,DS-CPL,EST,TM2,xTPR>
cpu1: "Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz"
cpu1: I-cache 32 KB 64B/line 8-way, D-cache 32 KB 64B/line 8-way
cpu1: using thermal monitor 1
mpbios: bus 0 is type PCI
mpbios: bus 1 is type PCI
mpbios: bus 2 is type PCI
mpbios: bus 3 is type PCI
mpbios: bus 4 is type ISA
ioapic0 at mainbus0 apid 4 (I/O APIC)
ioapic0: pa 0xfec00000, version 20, 24 pins
ioapic0: misconfigured as apic 0
ioapic0: remapped to apic 4
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82945G/P Memory Controller Hub (rev. 0x02)
agp0 at pchb0: detected 7932k stolen memory
agp0: aperture at 0xfdf00000, size 0x10000000
vga1 at pci0 dev 2 function 0: Intel 82945G/P Integrated Graphics Device
(rev. 0x02)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
azalia0 at pci0 dev 27 function 0: Generic High Definition Audio Controller
azalia0: interrupting at ioapic0 pin 16 (irq 5)
azalia0: host: Intel 82801GB/GR High Definition Audio Controller (rev. 1)
azalia0: host: High Definition Audio rev. 1.0
ppb0 at pci0 dev 28 function 0: Intel 82801GB/GR PCI Express Port #1
(rev. 0x01)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
re0 at pci1 dev 0 function 0pci_mem_find: void region
: RealTek 8168B/8111B PCIe Gigabit Ethernet (rev. 0x01)
re0: interrupting at ioapic0 pin 16 (irq 5)
re0: Ethernet address 00:01:29:0a:44:6b
re0: using 256 tx descriptors
rgephy0 at re0 phy 7: RTL8169S/8110S 1000BASE-T media interface, rev. 2
rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
ppb1 at pci0 dev 28 function 1: Intel 82801GB/GR PCI Express Port #2
(rev. 0x01)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
re1 at pci2 dev 0 function 0pci_mem_find: void region
: RealTek 8168B/8111B PCIe Gigabit Ethernet (rev. 0x01)
re1: interrupting at ioapic0 pin 17 (irq 10)
re1: Ethernet address 00:01:29:0a:44:6a
re1: using 256 tx descriptors
rgephy1 at re1 phy 7: RTL8169S/8110S 1000BASE-T media interface, rev. 2
rgephy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
uhci0 at pci0 dev 29 function 0: Intel 82801GB/GR USB UHCI Controller
(rev. 0x01)
uhci0: interrupting at ioapic0 pin 23 (irq 9)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1: Intel 82801GB/GR USB UHCI Controller
(rev. 0x01)
uhci1: interrupting at ioapic0 pin 19 (irq 11)
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2: Intel 82801GB/GR USB UHCI Controller
(rev. 0x01)
uhci2: interrupting at ioapic0 pin 18 (irq 11)
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3 at pci0 dev 29 function 3: Intel 82801GB/GR USB UHCI Controller
(rev. 0x01)
uhci3: interrupting at ioapic0 pin 16 (irq 5)
usb3 at uhci3: USB revision 1.0
uhub3 at usb3
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
ehci0 at pci0 dev 29 function 7: Intel 82801GB/GR USB EHCI Controller
(rev. 0x01)
ehci0: interrupting at ioapic0 pin 23 (irq 9)
ehci0: BIOS has given up ownership
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
usb4 at ehci0: USB revision 2.0
uhub4 at usb4
uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
ppb2 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0xe1)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled
pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801GB/GR LPC Interface Bridge (rev. 0x01)
piixide0 at pci0 dev 31 function 2
piixide0: Intel 82801GB/GR Serial ATA/Raid Controller (ICH7) (rev. 0x01)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus1 at piixide0 channel 1
Intel 82801GB/GR SMBus Controller (SMBus serial bus, revision 0x01) at
pci0 dev 31 function 3 not configured
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
pcppi0: children must have an explicit unit
midi0 at pcppi0: PC speaker (CPU-intensive output)
spkr0 at pcppi0
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff
npx0: reported by CPUID; using exception 16
pcppi0: attached to attimer0
isapnp0: no ISA Plug 'n Play devices found
ioapic0: enabling
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
azalia0: codec[2]: 0x10ec/0x0662 (rev. 1.1)
azalia0: codec[2]: High Definition Audio rev. 1.0
azalia0: playback: max channels=2, encodings=1<PCM>
azalia0: playback: PCM formats=e0160<24bit,20bit,16bit,96kHz,48kHz,44.1kHz>
azalia0: recording: max channels=2, encodings=1<PCM>
azalia0: recording: PCM formats=60160<20bit,16bit,96kHz,48kHz,44.1kHz>
audio0 at azalia0: full duplex, independent
Kernelized RAIDframe activated
wd0 at atabus0 drive 0: <ST3160815AS>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using
DMA)
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
cpu1: CPU 1 running
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)






Home | Main Index | Thread Index | Old Index