Subject: port-amd64/31360: profiling kernel on SMP amd64 panics
To: None <port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: None <oster@cs.usask.ca>
List: netbsd-bugs
Date: 09/20/2005 17:07:00
>Number:         31360
>Category:       port-amd64
>Synopsis:       profiling kernel on SMP amd64 panics
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-amd64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Sep 20 17:07:00 +0000 2005
>Originator:     oster@cs.usask.ca
>Release:        NetBSD 3.99.9
>Organization:
>Environment:
System: NetBSD quadrapus 3.99.9 NetBSD 3.99.9 (QUADRAPUS.ddb) #0: Mon Sep 19 13:58:19 CST 2005  root@quadrapus:/usr/local/current/src/sys/arch/amd64/compile/QUADRAPUS.ddb amd64
Architecture: amd64
Machine: amd64
>Description:
	Attempting to do kernel profiling on a quad-opteron box
results in a hung system.  dmesg is as follows:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 3.99.9 (QUADRAPUS.ddb.PROF) #0: Mon Sep 19 15:29:17 CST 2005
        root@quadrapus:/usr/local/current/src/sys/arch/amd64/compile/QUADRAPUS.ddb.PROF
total memory = 7743 MB
avail memory = 7462 MB
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (SUN      SunFire V40z)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) Processor 850, 2393.32 MHz
cpu0: features: e7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features: e7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu0: features: e7dbfbff<FXSR,SSE,SSE2,LONG,3DNOW2,3DNOW>
cpu0: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu0: L2 cache 1 MB 64B/line 16-way
cpu0: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: calibrating local timer
cpu0: apic clock running at 199 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: AMD Opteron(tm) Processor 850, 2393.18 MHz
cpu1: features: e7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features: e7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu1: features: e7dbfbff<FXSR,SSE,SSE2,LONG,3DNOW2,3DNOW>
cpu1: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu1: L2 cache 1 MB 64B/line 16-way
cpu1: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu1: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu2 at mainbus0: apid 2 (application processor)
cpu2: starting
cpu2: AMD Opteron(tm) Processor 850, 2393.18 MHz
cpu2: features: e7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu2: features: e7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu2: features: e7dbfbff<FXSR,SSE,SSE2,LONG,3DNOW2,3DNOW>
cpu2: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu2: L2 cache 1 MB 64B/line 16-way
cpu2: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu2: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu3 at mainbus0: apid 3 (application processor)
cpu3: starting
cpu3: AMD Opteron(tm) Processor 850, 2393.18 MHz
cpu3: features: e7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu3: features: e7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu3: features: e7dbfbff<FXSR,SSE,SSE2,LONG,3DNOW2,3DNOW>
cpu3: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu3: L2 cache 1 MB 64B/line 16-way
cpu3: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu3: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
mpbios: bus 0 is type PCI   
mpbios: bus 1 is type PCI   
mpbios: bus 2 is type PCI   
mpbios: bus 3 is type PCI   
mpbios: bus 32 is type PCI   
mpbios: bus 33 is type PCI   
mpbios: bus 37 is type PCI   
mpbios: bus 41 is type PCI   
mpbios: bus 45 is type PCI   
mpbios: bus 49 is type ISA   
ioapic0 at mainbus0 apid 4 (I/O APIC)
ioapic0: pa 0xfec00000, version 11, 24 pins
ioapic1 at mainbus0 apid 5 (I/O APIC)
ioapic1: pa 0xe4000000, version 11, 4 pins
ioapic2 at mainbus0 apid 6 (I/O APIC)
ioapic2: pa 0xe4001000, version 11, 4 pins
ioapic3 at mainbus0 apid 7 (I/O APIC)
ioapic3: pa 0xe5b01000, version 11, 4 pins
ioapic4 at mainbus0 apid 8 (I/O APIC)
ioapic4: pa 0xe5b03000, version 11, 4 pins
ioapic5 at mainbus0 apid 9 (I/O APIC)
ioapic5: pa 0xe5b05000, version 11, 4 pins
ioapic6 at mainbus0 apid 10 (I/O APIC)
ioapic6: pa 0xe5b07000, version 11, 4 pins
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
ppb0 at pci0 dev 6 function 0: Advanced Micro Devices AMD8111 I/O Hub (rev. 0x07)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
ohci0 at pci1 dev 0 function 0: Advanced Micro Devices AMD8111 USB Host Controller (rev. 0x0b)
ohci0: interrupting at ioapic0 pin 19 (irq 11)
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: Advanced Micro OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1 at pci1 dev 0 function 1: Advanced Micro Devices AMD8111 USB Host Controller (rev. 0x0b)
ohci1: interrupting at ioapic0 pin 19 (irq 11)
ohci1: OHCI version 1.0, legacy support
usb1 at ohci1: USB revision 1.0
uhub1 at usb1
uhub1: Advanced Micro OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
vga0 at pci1 dev 5 function 0: Trident Microsystems product 0x9880 (rev. 0x3a)
wsdisplay0 at vga0 kbdmux 1
wsmux1: connecting to wsdisplay0
pcib0 at pci0 dev 7 function 0
pcib0: Advanced Micro Devices AMD8111 LPC Controller (rev. 0x05)
viaide0 at pci0 dev 7 function 1
viaide0: Advanced Micro Devices AMD8111 IDE Controller (rev. 0x03)
viaide0: bus-master DMA support present
viaide0: primary channel configured to compatibility mode
viaide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus0 at viaide0 channel 0
viaide0: secondary channel configured to compatibility mode
viaide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus1 at viaide0 channel 1
Advanced Micro Devices AMD8111 ACPI Controller (miscellaneous bridge, revision 0x05) at pci0 dev 7 function 3 not configured
ppb1 at pci0 dev 10 function 0: Advanced Micro Devices AMD8131 PCI-X Tunnel (rev. 0x12)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
bge0 at pci2 dev 2 function 0: Broadcom BCM5703X Gigabit Ethernet
bge0: interrupting at ioapic1 pin 1 (irq 5)
bge0: ASIC BCM5703 A2 (0x1002), Ethernet address 00:09:3d:12:ac:c6
brgphy0 at bge0 phy 1: BCM5703 1000BASE-T media interface, rev. 2
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge1 at pci2 dev 3 function 0: Broadcom BCM5703X Gigabit Ethernet
bge1: interrupting at ioapic1 pin 2 (irq 7)
bge1: ASIC BCM5703 A2 (0x1002), Ethernet address 00:09:3d:12:ac:c7
brgphy1 at bge1 phy 1: BCM5703 1000BASE-T media interface, rev. 2
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
mpt0 at pci2 dev 4 function 0: LSI Logic 53c1030 Ultra320 SCSI
mpt0: interrupting at ioapic1 pin 3 (irq 11)
scsibus0 at mpt0: 16 targets, 8 luns per target
aapic0 at pci0 dev 10 function 1: Advanced Micro Devices AMD8131 IO Apic (rev. 0x01)
ppb2 at pci0 dev 11 function 0: Advanced Micro Devices AMD8131 PCI-X Tunnel (rev. 0x12)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled
aapic1 at pci0 dev 11 function 1: Advanced Micro Devices AMD8131 IO Apic (rev. 0x01)
pchb0 at pci0 dev 24 function 0
pchb0: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb1 at pci0 dev 24 function 1
pchb1: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb2 at pci0 dev 24 function 2
pchb2: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb3 at pci0 dev 24 function 3
pchb3: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
pchb4 at pci0 dev 25 function 0
pchb4: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb5 at pci0 dev 25 function 1
pchb5: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb6 at pci0 dev 25 function 2
pchb6: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb7 at pci0 dev 25 function 3
pchb7: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
pchb8 at pci0 dev 26 function 0
pchb8: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb9 at pci0 dev 26 function 1
pchb9: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb10 at pci0 dev 26 function 2
pchb10: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb11 at pci0 dev 26 function 3
pchb11: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
pchb12 at pci0 dev 27 function 0
pchb12: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb13 at pci0 dev 27 function 1
pchb13: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb14 at pci0 dev 27 function 2
pchb14: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb15 at pci0 dev 27 function 3
pchb15: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 mux 1
wskbd0: connecting to wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
pcppi0: attached to attimer0
pci4 at mainbus0 bus 32
pci4: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
ppb3 at pci4 dev 1 function 0: Advanced Micro Devices AMD8131 PCI-X Tunnel (rev. 0x12)
pci5 at ppb3 bus 33
pci5: i/o space, memory space enabled
aapic2 at pci4 dev 1 function 1: Advanced Micro Devices AMD8131 IO Apic (rev. 0x01)
ppb4 at pci4 dev 2 function 0: Advanced Micro Devices AMD8131 PCI-X Tunnel (rev. 0x12)
pci6 at ppb4 bus 37
pci6: i/o space, memory space enabled
aapic3 at pci4 dev 2 function 1: Advanced Micro Devices AMD8131 IO Apic (rev. 0x01)
ppb5 at pci4 dev 3 function 0: Advanced Micro Devices AMD8131 PCI-X Tunnel (rev. 0x12)
pci7 at ppb5 bus 41
pci7: i/o space, memory space enabled
aapic4 at pci4 dev 3 function 1: Advanced Micro Devices AMD8131 IO Apic (rev. 0x01)
ppb6 at pci4 dev 4 function 0: Advanced Micro Devices AMD8131 PCI-X Tunnel (rev. 0x12)
pci8 at ppb6 bus 45
pci8: i/o space, memory space enabled
aapic5 at pci4 dev 4 function 1: Advanced Micro Devices AMD8131 IO Apic (rev. 0x01)
ioapic0: enabling
ioapic1: enabling
ioapic2: enabling
ioapic3: enabling
ioapic4: enabling
ioapic5: enabling
ioapic6: enabling
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
raidattach: Asked for 8 units
Kernelized RAIDframe activated
Profiling kernel, textsize=7054864 [ffffffff80100000..ffffffff807ba610]
scsibus0: waiting 2 seconds for devices to settle...
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 0: <MATSHITADVD-ROM SR-8178, , PY19> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(viaide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST373207LC, 0002> disk fixed
sd0: 70007 MB, 90774 cyl, 2 head, 789 sec, 512 bytes/sect x 143374744 sectors
sd0: sync (6.25ns offset 63), 16-bit (320.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST373207LC, 0002> disk fixed
sd1: 70007 MB, 90774 cyl, 2 head, 789 sec, 512 bytes/sect x 143374744 sectors
sd1: sync (6.25ns offset 63), 16-bit (320.000MB/s) transfers, tagged queueing
ses0 at scsibus0 target 6 lun 0: <SDR, GEM318P, 1> processor fixed
ses0: SAF-TE Compliant Device
ses0: async, 8-bit transfers
Searching for RAID components...
Component on: sd0a: 143374681
   Row: 0 Column: 0 Num Rows: 1 Num Columns: 2
   Version: 2 Serial Number: 2005071501 Mod Counter: 300
   Clean: Yes Status: 0
   sectPerSU: 128 SUsPerPU: 1 SUsPerRU: 1
   RAID Level: 1  blocksize: 512 numBlocks: 143374592
   Autoconfig: Yes
   Contains root partition: Yes
   Last configured as: raid0
Component on: sd1a: 143374681
   Row: 0 Column: 1 Num Rows: 1 Num Columns: 2
   Version: 2 Serial Number: 2005071501 Mod Counter: 300
   Clean: Yes Status: 0
   sectPerSU: 128 SUsPerPU: 1 SUsPerRU: 1
   RAID Level: 1  blocksize: 512 numBlocks: 143374592
   Autoconfig: Yes
   Contains root partition: Yes
   Last configured as: raid0
Found: sd0a at 0
Found: sd1a at 1
RAID autoconfigure
Configuring raid0:
Starting autoconfiguration of RAID set...
Looking for 0 in autoconfig
Found: sd0a at 0
Looking for 1 in autoconfig
Found: sd1a at 1
raid0: allocating 20 buffers of 65536 bytes.
raid0: RAID Level 1
raid0: Components: /dev/sd0a /dev/sd1a
raid0: Total Sectors: 143374592 (70007 MB)
boot device: raid0
root on raid0a dumps on raid0b
mountroot: trying lfs...
mountroot: trying ffs...
root file system type: ffs
cpu1: CPU 1 running
init: copying out path `/sbin/init' 11
cpu2: CPU 2 running
cpu3: CPU 3 running
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

>How-To-Repeat:

	.../configure -p KERNEL_CONFIG
	boot new kernel with profiling
	turn on kernel profiling with 'kgmon -b'
	wait a bit, and/or start to stress the system
	observe the fun-happy panics that result:

panic: TLB IPI rendezvous failed (mask 2)
Stopped in pid 315.1 (ypbind) at        netbsd:cpu_Debugger+0xa:        leave
db{0}> tr
cpu_Debugger() at netbsd:cpu_Debugger+0xa
panic() at netbsd:panic+0x1cd
pmap_tlb_shootnow() at netbsd:pmap_tlb_shootnow+0x14c
pmap_enter() at netbsd:pmap_enter+0x400
uvm_fault() at netbsd:uvm_fault+0x6b9
uvm_fault_wire() at netbsd:uvm_fault_wire+0x58
sysctl_lock() at netbsd:sysctl_lock+0x7a
sys___sysctl() at netbsd:sys___sysctl+0xb1
syscall_plain() at netbsd:syscall_plain+0x100
--- syscall (number 82) ---
0x2006047b0:

or:

panic: TLB IPI rendezvous failed (mask 8)
Stopped in pid 442.1 (raid_parity) at   netbsd:cpu_Debugger+0xa:        leave
db{0}> tr
cpu_Debugger() at netbsd:cpu_Debugger+0xa
panic() at netbsd:panic+0x1cd
pmap_tlb_shootnow() at netbsd:pmap_tlb_shootnow+0x14c
pmap_kremove() at netbsd:pmap_kremove+0xa0
uvm_km_free() at netbsd:uvm_km_free+0xae
free() at netbsd:free+0x290
rf_FreeAllocList() at netbsd:rf_FreeAllocList+0x7b
rf_VerifyParityRAID1() at netbsd:rf_VerifyParityRAID1+0x6c0
rf_VerifyParity() at netbsd:rf_VerifyParity+0x76
rf_RewriteParity() at netbsd:rf_RewriteParity+0xa9
rf_RewriteParityThread() at netbsd:rf_RewriteParityThread+0x55
db{0}> 

or: 

panic: TLB IPI rendezvous failed (mask 8)
Stopped in pid 1234.1 (cc1) at  netbsd:cpu_Debugger+0xa:        leave
db{0}> tr
cpu_Debugger() at netbsd:cpu_Debugger+0xa
panic() at netbsd:panic+0x1cd
pmap_tlb_shootnow() at netbsd:pmap_tlb_shootnow+0x14c
pmap_do_remove() at netbsd:pmap_do_remove+0xbe
ubc_alloc() at netbsd:ubc_alloc+0x3ae
ffs_read() at netbsd:ffs_read+0x308
VOP_READ() at netbsd:VOP_READ+0x36
vn_read() at netbsd:vn_read+0x9f
dofileread() at netbsd:dofileread+0x97
sys_read() at netbsd:sys_read+0x83
syscall_plain() at netbsd:syscall_plain+0x100
uvm_fault(0xffff80002db6db08, 0x0, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
db{0}> 

Machine runs fine with the same kernel so long as profiling is
not turned on.

>Fix:
please :)