Subject: kern/32757: TLB IPI rendezvous fails sometimes
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: seebs <seebs@vash.cel.plethora.net>
List: netbsd-bugs
Date: 02/06/2006 09:50:01
>Number:         32757
>Category:       kern
>Synopsis:       kernel occasionally panics with "TLB IPI rendezvous failed"
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 06 09:50:00 +0000 2006
>Originator:     seebs
>Release:        NetBSD 2.1
>Organization:
>Environment:
NetBSD ns1.cheetah.net 2.1 NetBSD 2.1 (CHEETAH) #0: Thu Dec 29 04:02:46 PST 2005  beta1@ns1.cheetah.net:/usr/src/2.1/usr/src/sys/arch/i386/compile/CHEETAH i386
Architecture: i386
Machine: i386
>Description:
	On at least some motherboards, NetBSD 2.1 occasionally fails with TLB
	IPI rendezvous failed.  The patch (from pmap.c 1.184) is verified
	present.
>How-To-Repeat:
	Run under load.

	Someone else on the NetBSD lists reports the same behavior with a
	Pentium 3 system, suggesting that this isn't just a specific
	motherboard, but it's obviously rare.  Here's full dmesg output
	from the system in single-processor mode.  (It's not stable enough
	in SMP mode to run in production.)

NetBSD 2.1 (CHEETAH) #0: Thu Dec 29 04:02:46 PST 2005
	beta1@ns1.cheetah.net:/usr/src/2.1/usr/src/sys/arch/i386/compile/CHEETAH
total memory = 1022 MB
avail memory = 996 MB
BIOS32 rev. 0 found at 0xfd6d0
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Xeon (686-class), 3065.96 MHz, id 0xf29
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: calibrating local timer
cpu0: apic clock running at 133 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 6 (application processor)
cpu1: not started
cpu2 at mainbus0: apid 1 (application processor)
cpu2: not started
cpu3 at mainbus0: apid 7 (application processor)
cpu3: not started
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0 apid 3 (I/O APIC)
ioapic1: pa 0xfec80000, version 20, 24 pins
ioapic2 at mainbus0 apid 4 (I/O APIC)
ioapic2: pa 0xfec80100, version 20, 24 pins
cpu4 at mainbus0: (uniprocessor)
cpu4: Intel Xeon (686-class), 3065.82 MHz, id 0xf29
cpu4: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu4: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu4: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu4: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu4: L2 cache 512 KB 64B/line 8-way
cpu4: ITLB 4K/4M: 64 entries
cpu4: DTLB 4K/4M: 64 entries
acpi0 at mainbus0
acpi0: using Intel ACPI CA subsystem version 20040211
acpi0: X/RSDT: OemId <PTLTD ,  RSDT  ,06040000>, AslId < LTP,00000000>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
acpi: activated PNP0C0F
acpi: activated PNP0C0F
PNP0A03 [PCI Bus] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
PNP0C04 [Math Coprocessor] at acpi0 not configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
PNP0800 [AT-style speaker sound] at acpi0 not configured
PNP0100 [AT Timer] at acpi0 not configured
PNP0303 [IBM Enhanced (101/102-key, PS/2 mouse support)] at acpi0 not configured
PNP0F13 [PS/2 Port for PS/2-style Mice] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
INT0800 at acpi0 not configured
PNP0A05 [Generic ACPI Bus] at acpi0 not configured
PNP0501 [16550A-compatible COM port] at acpi0 not configured
PNP0501 [16550A-compatible COM port] at acpi0 not configured
PNP0700 [PC standard floppy disk controller] at acpi0 not configured
PNP0401 [ECP printer port] at acpi0 not configured
PNP0C0C [ACPI power button device] at acpi0 not configured
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel E7505 MCH Host (rev. 0x03)
agp0 at pchb0: using generic initialization for Intel AGP
agp0: aperture at 0xf4000000, size 0x4000000
Intel E7505 MCH RAS Controller (undefined subclass 0x00, revision 0x03) at pci0 dev 0 function 1 not configured
ppb0 at pci0 dev 1 function 0: Intel E7505 MCH Host-to-AGP Bridge (rev. 0x03)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
ppb1 at pci0 dev 2 function 0: Intel E7505 MCH HI_B PCI-to-PCI (rev. 0x03)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
Intel 82870P2 P64H2 IOxAPIC (interrupt system, interface 0x20, revision 0x04) at pci2 dev 28 function 0 not configured
ppb2 at pci2 dev 29 function 0: Intel 82870P2 P64H2 PCI-to-PCI Bridge (rev. 0x04)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled
wm0 at pci3 dev 3 function 0: Intel i82545EM 1000BASE-T Ethernet, rev. 1
wm0: interrupting at ioapic2 pin 6 (irq 12)
wm0: 64-bit 133MHz PCIX bus
wm0: 256 word (8 address bits) MicroWire EEPROM
wm0: Ethernet address 00:30:48:73:a3:13
ukphy0 at wm0 phy 1: Generic IEEE 802.3u media interface
ukphy0: Marvell 88E1011 Gigabit PHY (OUI 0x000ac2, model 0x0002), rev. 3
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
Intel 82870P2 P64H2 IOxAPIC (interrupt system, interface 0x20, revision 0x04) at pci2 dev 30 function 0 not configured
ppb3 at pci2 dev 31 function 0: Intel 82870P2 P64H2 PCI-to-PCI Bridge (rev. 0x04)
pci4 at ppb3 bus 4
pci4: i/o space, memory space enabled
ahd0 at pci4 dev 3 function 0
ahd0: interrupting at ioapic1 pin 8 (irq 12)
ahd0: aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs
scsibus0 at ahd0: 16 targets, 8 luns per target
ahd1 at pci4 dev 3 function 1
ahd1: interrupting at ioapic1 pin 9 (irq 12)
ahd1: aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs
scsibus1 at ahd1: 16 targets, 8 luns per target
uhci0 at pci0 dev 29 function 0: Intel 82801DB/DBM USB UHCI Controller #1 (rev. 0x02)
uhci0: interrupting at ioapic0 pin 16 (irq 11)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1: Intel 82801DB/DBM USB UHCI Controller #2 (rev. 0x02)
uhci1: interrupting at ioapic0 pin 19 (irq 10)
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2: Intel 82801DB/DBM USB UHCI Controller #3 (rev. 0x02)
uhci2: interrupting at ioapic0 pin 18 (irq 5)
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0 at pci0 dev 29 function 7: Intel 82801DB/DBM USB EHCI Controller (rev. 0x02)
ehci0: interrupting at ioapic0 pin 23 (irq 12)
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
usb3 at ehci0: USB revision 2.0
uhub3 at usb3
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
ppb4 at pci0 dev 30 function 0: Intel 82801BA Hub-to-PCI Bridge (rev. 0x82)
pci5 at ppb4 bus 5
pci5: i/o space, memory space enabled
ex0 at pci5 dev 1 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
ex0: interrupting at ioapic0 pin 16 (irq 11)
ex0: MAC address 00:10:5a:83:1b:65
exphy0 at ex0 phy 24: 3Com internal media interface
exphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vga0 at pci5 dev 2 function 0: ATI Technologies Rage XL (AGP) (rev. 0x65)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
ichlpcib0 at pci0 dev 31 function 0
ichlpcib0: Intel 82801DB LPC Interface Bridge (rev. 0x02)
ichlpcib0: TCO (watchdog) timer configured.
pciide0 at pci0 dev 31 function 1
pciide0: Intel 82801DB IDE Controller (UltraATA/100) (rev. 0x02)
pciide0: bus-master DMA support present, but unused (no driver support)
pciide0: primary channel configured to compatibility mode
pciide0: primary channel ignored (not responding; disabled or no drives?)
pciide0: secondary channel configured to compatibility mode
pciide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus0 at pciide0 channel 1
Intel 82801DB/DBM SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 31 function 3 not configured
Intel 82801DB/DBM AC97 Audio Controller (audio multimedia, revision 0x02) at pci0 dev 31 function 5 not configured
isa0 at ichlpcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
npx0 at isa0 port 0xf0-0xff: using exception 16
ioapic2: enabling
ioapic1: enabling
ioapic0: enabling
IPsec: Initialized Security Association Processing.
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 0: <TSSTcorpCD/DVDW TS-H552B, , TS05> cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST373207LW, 0003> disk fixed
sd0: 70007 MB, 90774 cyl, 2 head, 789 sec, 512 bytes/sect x 143374744 sectors
sd0: sync (6.25ns offset 63), 16-bit (320.000MB/s) transfers, tagged queueing
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

>Fix:
	Workaround:  Run single-processor.

	Actual fix:  Not known.  The other person I talked to in late December
	had a patch which simply retried the rendezvous (possibly re-sending
	something?  I don't have the patch) which apparently worked, but
	implies that there is a race condition somewhere.