Subject: panic: TLB IPI rendezvous failed (mask 4)
To: None <current-users@netbsd.org>
From: Paul Dokas <dokas@cs.umn.edu>
List: current-users
Date: 06/02/2004 11:03:45
I've got a Dell PowerEdge 6600 with 4 Xeon processors, running -current, that I'm
using as a PostgreSQL server (see below for the dmesg) and I'm having problems
similar to those reported in PR 25285.  Basically, under load, I'm getting panics
that look like this (copied by hand):

  panic:  TLB IPI rendezvous failed (mask 4)
  Stopped in pid 4552.1 (postgres) at netbsd: cpu_Debugger+0x4 leave

in the debugger, a backtrace looks like this:

  cpu_Debugger()
  panic()
  pmap_tlb_shootdown()
  pmap_do_remove()
  pmap_remove()
  ubc_alloc()
  ffs_read()
  VOP_READ()
  vn_read()
  dofileread()
  sys_read()
  syscall_plain()
  --- syscall (number 3) ---

I've applied the patch suggested in the PR, but the panics are still happening.
Here's the patch that I applied:

  *** vector.S.orig       Tue May 18 10:03:48 2004
  --- vector.S    Tue May 18 10:05:03 2004
  ***************
  *** 163,169 ****
          pushl   $0
          pushl   $T_ASTFLT
          INTRENTRY
  -       movl    $0,_C_LABEL(local_apic)+LAPIC_EOI
          movl    CPUVAR(ILEVEL),%ebx
          cmpl    $IPL_IPI,%ebx
          jae     2f
  --- 163,168 ----
  ***************
  *** 173,178 ****
  --- 172,178 ----
            sti
          pushl   %ebx
          call    _C_LABEL(x86_ipi_handler)
  +       movl    $0,_C_LABEL(local_apic)+LAPIC_EOI
          jmp     _C_LABEL(Xdoreti)
    2:
          orl     $(1 << LIR_IPI),CPUVAR(IPENDING)


The next thing that I'm going to do is to make sure that my BIOS is up to date,
but I'm not too hopeful of that fixing this.  I suspect that there's a missed
lock or similar SMP related condition somewhere in the kernel.  But, I don't
know what a TLB IPI *is*, let alone how to find and fix this.

Anyone have any ideas?  I've got a machine that I can reproduce this on.

Paul




NetBSD 2.0F (R.MP) #0: Tue Jun  1 13:26:04 CDT 2004
        root@r.cs.umn.edu:/usr/obj/sys/arch/i386/compile/R.MP
total memory = 3839 MB
avail memory = 3754 MB
BIOS32 rev. 0 found at 0xffe90
PCI BIOS rev. 2.1 found at 0xfc65a
pcibios: config mechanism [1][x], special cycles [x][x], last bus 24
PCI IRQ Routing Table rev. 1.0 found at 0xfc030, size 272 bytes (15 entries)
PCI Interrupt Router at 000:15:0 (ServerWorks product 0x0225 compatible)
pci_intr_fixup: no compatible PCI ICU found: ICU vendor 0x1166 product 0x0225
Warning: unable to fix up PCI interrupt routing
PCI fixup examining 1166:11
PCI fixup examining 1166:11
PCI fixup examining 1166:11
PCI fixup examining 1166:11
PCI fixup examining 9005:8f
PCI fixup examining 1002:4752
PCI fixup examining 1166:201
PCI fixup examining 1166:212
PCI fixup examining 1166:220
PCI fixup examining 1166:225
PCI fixup examining 1166:10
PCI fixup examining 1166:10
PCI fixup examining 1166:10
PCI fixup examining 1166:10
PCI fixup examining 1166:10
PCI fixup examining 1166:10
PCI bus #0 is the last bus
[System BIOS Setting]-----------------------
  device vendor product
  register space address    size
--------------------------------------------
000:00:0 0x1166 0x0011 
                [OK]
000:00:1 0x1166 0x0011 
                [OK]
000:00:2 0x1166 0x0011 
                [OK]
000:00:3 0x1166 0x0011 
                [OK]
000:03:0 0x9005 0x008f 
        10h port 0x0000ec00 0x00000100
        14h mem  0xfe102000 0x00001000
                [OK]
000:04:0 0x1002 0x4752 
        10h mem  0xfd000000 0x01000000
        14h port 0x0000e800 0x00000100
        18h mem  0xfe101000 0x00001000
                [OK]
000:15:0 0x1166 0x0201 
                [OK]
000:15:1 0x1166 0x0212 
        10h port 0x000008c0 0x00000008
        14h port 0x000008c8 0x00000004
        18h port 0x000008d0 0x00000008
        1ch port 0x000008d8 0x00000004
        20h port 0x000008b0 0x00000010
                [OK]
000:15:2 0x1166 0x0220 
        10h mem  0xfe100000 0x00001000
                [OK]
000:15:3 0x1166 0x0225 
                [OK]
000:16:0 0x1166 0x0010 
                [OK]
000:16:2 0x1166 0x0010 
                [OK]
000:17:0 0x1166 0x0010 
                [OK]
000:17:2 0x1166 0x0010 
                [OK]
000:18:0 0x1166 0x0010 
                [OK]
000:18:2 0x1166 0x0010 
                [OK]
--------------------------[  0 devices bogus]
 Physical memory end: 0xeffdc000
 PCI memory mapped I/O space start: 0xf0000000
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (DELL     PE 0109     )
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Xeon MP (686-class), 1993.70 MHz, id 0xf25
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 4400<CID>
cpu0: "Intel(R) Xeon(TM) MP CPU 2.00GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64b/line 4-way
cpu0: L2 cache 512 KB 64b/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: running without thermal monitor!
cpu0: calibrating local timer
cpu0: apic clock running at 99 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 4 (application processor)
cpu1: starting
cpu1: Intel Xeon MP (686-class), 1993.54 MHz, id 0xf25
cpu1: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu1: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: features2 4400<CID>
cpu1: "Intel(R) Xeon(TM) MP CPU 2.00GHz"
cpu1: I-cache 12K uOp cache 8-way, D-cache 8 KB 64b/line 4-way
cpu1: L2 cache 512 KB 64b/line 8-way
cpu1: ITLB 4K/4M: 64 entries
cpu1: DTLB 4K/4M: 64 entries
cpu1: running without thermal monitor!
cpu2 at mainbus0: apid 6 (application processor)
cpu2: starting
cpu2: Intel Xeon MP (686-class), 1993.54 MHz, id 0xf25
cpu2: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu2: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu2: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu2: features2 4400<CID>
cpu2: "Intel(R) Xeon(TM) MP CPU 2.00GHz"
cpu2: I-cache 12K uOp cache 8-way, D-cache 8 KB 64b/line 4-way
cpu2: L2 cache 512 KB 64b/line 8-way
cpu2: ITLB 4K/4M: 64 entries
cpu2: DTLB 4K/4M: 64 entries
cpu2: running without thermal monitor!
cpu3 at mainbus0: apid 2 (application processor)
cpu3: starting
cpu3: Intel Xeon MP (686-class), 1993.54 MHz, id 0xf25
cpu3: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu3: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu3: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu3: features2 4400<CID>
cpu3: "Intel(R) Xeon(TM) MP CPU 2.00GHz"
cpu3: I-cache 12K uOp cache 8-way, D-cache 8 KB 64b/line 4-way
cpu3: L2 cache 512 KB 64b/line 8-way
cpu3: ITLB 4K/4M: 64 entries
cpu3: DTLB 4K/4M: 64 entries
cpu3: running without thermal monitor!
mpbios: bus 0 is type PCI   
mpbios: bus 1 is type PCI   
mpbios: bus 2 is type PCI   
mpbios: bus 3 is type PCI   
mpbios: bus 4 is type PCI   
mpbios: bus 5 is type PCI   
mpbios: bus 6 is type PCI   
mpbios: bus 7 is type PCI   
mpbios: bus 8 is type PCI   
mpbios: bus 9 is type PCI   
mpbios: bus 10 is type PCI   
mpbios: bus 11 is type PCI   
mpbios: bus 12 is type PCI   
mpbios: bus 13 is type PCI   
mpbios: bus 14 is type PCI   
mpbios: bus 15 is type PCI   
mpbios: bus 16 is type PCI   
mpbios: bus 17 is type PCI   
mpbios: bus 18 is type PCI   
mpbios: bus 19 is type PCI   
mpbios: bus 20 is type PCI   
mpbios: bus 21 is type PCI   
mpbios: bus 22 is type PCI   
mpbios: bus 23 is type PCI   
mpbios: bus 24 is type PCI   
mpbios: bus 25 is type PCI   
mpbios: bus 26 is type PCI   
mpbios: bus 27 is type PCI   
mpbios: bus 28 is type PCI   
mpbios: bus 29 is type ISA   
ioapic0 at mainbus0 apid 8 (I/O APIC)
ioapic0: pa 0xfec00000, version 11, 16 pins
ioapic0: misconfigured as apic 0
ioapic0: remapped to apic 8
ioapic1 at mainbus0 apid 9 (I/O APIC)
ioapic1: pa 0xfec01000, version 11, 16 pins
ioapic1: misconfigured as apic 0
ioapic1: remapped to apic 9
ioapic2 at mainbus0 apid 10 (I/O APIC)
ioapic2: pa 0xfec02000, version 11, 16 pins
ioapic2: misconfigured as apic 0
ioapic2: remapped to apic 10
pnpbios0 at mainbus0: code f0000, data 40, entry e2f4, control 0 eventp 0
pnpbios0: nodes 15, max len 125
PNP0C02 (mem 0-9ffff 100000-83ffffff ff000000-ffffffff f0000-fffff, io 800-890 8a0-8af 8b0-8ff 62-63 65-6f e0-ef ca2-ca7, irq 15) at pnpbios0 index 0 ignored
com0 at pnpbios0 index 2 (PNP0501)
com0: io 3f8-3ff, irq 4
com0: ns16550a, working fifo
fdc0 at pnpbios0 index 5 (PNP0700)
fdc0: io 3f0-3f5, irq 6, DMA 2
fdc0: ctl io 3f7 didn't probe. Forced attach
pckbc1 at pnpbios0 index 6 (PNP0F13): aux port
PNP0A03 (io cf8-cff) at pnpbios0 index 7 ignored
PNP0C02 (irq 9) at pnpbios0 index 8 ignored
PNP0000 (io 20-3f a0-bf 4d0-4d1, irq 2) at pnpbios0 index 9 ignored
PNP0003 (mem fee00000-feffffff fec00000-fedfffff) at pnpbios0 index 10 ignored
PNP0100 (io 40-5f, irq 0) at pnpbios0 index 11 ignored
PNP0200 (io 80-9f 0-1f c0-df, DMA 4) at pnpbios0 index 12 ignored
pckbc2 at pnpbios0 index 13 (PNP0303): kbd port
PNP0800 (io 61) at pnpbios0 index 14 ignored
PNP0B00 (io 70-7f, irq 8) at pnpbios0 index 15 ignored
PNP0C04 (io f0-ff, irq 13) at pnpbios0 index 16 ignored
PNP0C01 (io 92 c00-c01 c06-c08 c14 c50-c51 c6f cd6-cd7 f50-f58) at pnpbios0 index 18 ignored
pckbd0 at pckbc1 (kbd slot)
pckbc1: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
pms0 at pckbc1 (aux slot)
pckbc1: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: ServerWorks CMIC_HE Host (rev. 0x22)
pchb1 at pci0 dev 0 function 1
pchb1: ServerWorks CMIC_HE Host (rev. 0x00)
pci1 at pchb1 bus 3
pci1: no spaces enabled!
pchb2 at pci0 dev 0 function 2
pchb2: ServerWorks CMIC_HE Host (rev. 0x00)
pci2 at pchb2 bus 9
pci2: no spaces enabled!
pchb3 at pci0 dev 0 function 3
pchb3: ServerWorks CMIC_HE Host (rev. 0x00)
pci3 at pchb3 bus 19
pci3: no spaces enabled!
ahc0 at pci0 dev 3 function 0: Adaptec aic7892 Ultra160 SCSI adapter
ahc0: interrupting at ioapic1 pin 0 (irq 11)
ahc0: aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
vga1 at pci0 dev 4 function 0: ATI Technologies Rage XL (rev. 0x27)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
pchb4 at pci0 dev 15 function 0
pchb4: ServerWorks CSB5 SouthBridge (rev. 0x93)
pci4 at pchb4 bus 7
pci4: i/o space, memory space enabled
rccide0 at pci0 dev 15 function 1
rccide0: ServerWorks CSB5 IDE Controller (rev. 0x93)
rccide0: bus-master DMA support present
rccide0: primary channel configured to compatibility mode
rccide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus0 at rccide0 channel 0
rccide0: secondary channel wired to compatibility mode
rccide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus1 at rccide0 channel 1
ohci0 at pci0 dev 15 function 2: ServerWorks OSB4/CSB5 USB Host Controller (rev. 0x05)
ohci0: interrupting at ioapic0 pin 10 (irq 10)
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: ServerWorks OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
pcib0 at pci0 dev 15 function 3
pcib0: ServerWorks product 0x0225 (rev. 0x00)
pchb5 at pci0 dev 16 function 0
pchb5: ServerWorks CIOB30 (rev. 0x03)
pci5 at pchb5 bus 3
pci5: i/o space, memory space enabled
pchb6 at pci0 dev 16 function 2
pchb6: ServerWorks CIOB30 (rev. 0x03)
pci6 at pchb6 bus 8
pci6: i/o space, memory space enabled
bge0 at pci6 dev 1 function 0: Broadcom BCM5700 Gigabit Ethernet
bge0: interrupting at ioapic1 pin 1 (irq 7)
bge0: ASIC BCM5700 Altima (0x7104), Ethernet address 00:0d:56:70:ee:00
brgphy0 at bge0 phy 1: BCM5411 1000BASE-T media interface, rev. 1
brgphy0: using BCM5411 DSP patch
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge1 at pci6 dev 2 function 0: Broadcom BCM5700 Gigabit Ethernet
bge1: interrupting at ioapic1 pin 2 (irq 5)
bge1: ASIC BCM5700 Altima (0x7104), Ethernet address 00:0d:56:70:ee:02
brgphy1 at bge1 phy 1: BCM5411 1000BASE-T media interface, rev. 1
brgphy1: using BCM5411 DSP patch
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
pchb7 at pci0 dev 17 function 0
pchb7: ServerWorks CIOB30 (rev. 0x03)
pci7 at pchb7 bus 9
pci7: i/o space, memory space enabled
pchb8 at pci0 dev 17 function 2
pchb8: ServerWorks CIOB30 (rev. 0x03)
pci8 at pchb8 bus 14
pci8: i/o space, memory space enabled
pchb9 at pci0 dev 18 function 0
pchb9: ServerWorks CIOB30 (rev. 0x03)
pci9 at pchb9 bus 19
pci9: i/o space, memory space enabled
pchb10 at pci0 dev 18 function 2
pchb10: ServerWorks CIOB30 (rev. 0x03)
pci10 at pchb10 bus 24
pci10: i/o space, memory space enabled
isa0 at pcib0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
spkr0 at pcppi0
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
isapnp0: no ISA Plug 'n Play devices found
ioapic2: enabling
ioapic1: enabling
ioapic0: enabling
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
scsibus0: waiting 2 seconds for devices to settle...
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 0: <TEAC CD-ROM CD-224E, , K.9A> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(rccide0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33) (using DMA data transfers)
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST336607LC, DS09> disk fixed
sd0: 34732 MB, 49855 cyl, 2 head, 713 sec, 512 bytes/sect x 71132959 sectors
sd0: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 1 lun 0: <IBM, IC35L146UCDY10-0, S27F> disk fixed
sd1: 136 GB, 36703 cyl, 12 head, 651 sec, 512 bytes/sect x 286749480 sectors
sd1: sync (12.50ns offset 127), 16-bit (160.000MB/s) transfers, tagged queueing
sd2 at scsibus0 target 2 lun 0: <IBM, IC35L146UCDY10-0, S27F> disk fixed
sd2: 136 GB, 36703 cyl, 12 head, 651 sec, 512 bytes/sect x 286749480 sectors
sd2: sync (12.50ns offset 127), 16-bit (160.000MB/s) transfers, tagged queueing
sd3 at scsibus0 target 3 lun 0: <IBM, IC35L146UCDY10-0, S27F> disk fixed
sd3: 136 GB, 36703 cyl, 12 head, 651 sec, 512 bytes/sect x 286749480 sectors
sd3: sync (12.50ns offset 127), 16-bit (160.000MB/s) transfers, tagged queueing
sd4 at scsibus0 target 4 lun 0: <IBM, IC35L146UCDY10-0, S27F> disk fixed
sd4: 136 GB, 36703 cyl, 12 head, 651 sec, 512 bytes/sect x 286749480 sectors
sd4: sync (12.50ns offset 127), 16-bit (160.000MB/s) transfers, tagged queueing
ses0 at scsibus0 target 6 lun 0: <PE/PV, 1x8 SCSI BP, 1.1> processor fixed
ses0: SAF-TE Compliant Device
ses0: async, 8-bit transfers
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
cpu3: CPU 2 running
cpu1: CPU 4 running
cpu2: CPU 6 running
raid0: Component /dev/sd1a being configured at col: 0
         Column: 0 Num Columns: 4
         Version: 2 Serial Number: 86587 Mod Counter: 355
         Clean: Yes Status: 0
raid0: Component /dev/sd2a being configured at col: 1
         Column: 1 Num Columns: 4
         Version: 2 Serial Number: 86587 Mod Counter: 355
         Clean: Yes Status: 0
raid0: Component /dev/sd3a being configured at col: 2
         Column: 2 Num Columns: 4
         Version: 2 Serial Number: 86587 Mod Counter: 355
         Clean: Yes Status: 0
raid0: Component /dev/sd4a being configured at col: 3
         Column: 3 Num Columns: 4
         Version: 2 Serial Number: 86587 Mod Counter: 355
         Clean: Yes Status: 0
raid0: RAID Level 0
raid0: Components: /dev/sd1a /dev/sd2a /dev/sd3a /dev/sd4a
raid0: Total Sectors: 1146997504 (560057 MB)
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

-- 
Paul Dokas                                            dokas@cs.umn.edu
======================================================================
Don Juan Matus:  "an enigma wrapped in mystery wrapped in a tortilla."