Subject: kern/32798: kernel page fault
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <martin@aprisoft.de>
List: netbsd-bugs
Date: 02/11/2006 20:40:01
>Number:         32798
>Category:       kern
>Synopsis:       kernel page fault
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 11 20:40:01 +0000 2006
>Originator:     martin@aprisoft.de
>Release:        NetBSD 3.99.15
>Organization:
>Environment:
System: NetBSD martins.aprisoft.de 3.99.15 NetBSD 3.99.15 (MARTINS) #0: Sat Feb 11 00:33:04 CET 2006 martin@martins.aprisoft.de:/usr/src/sys/arch/amd64/compile/MARTINS amd64
Architecture: x86_64
Machine: amd64
>Description:
On this machine I can reliably panic the kernel by doing a cvs checkout of the
netbsd source tree. The machine has 4GB RAM, some of which is physically 
mapped > 4GB (i386 complains about this, that's how I know).

I'm not sure the panic is exactly the same, but it's always in pool related
code (earlier kernels had no -fno-omit-framepointer, so ddb was not completely
helpful). This is basically a GENERIC.MP kernel, plus DDB, LOCKDEBUG, DEBUG,
DIAGNOSTIC and acpi stuff.

The panic (manually transcribed) is:

uvm_fault(0xffff8000194bb498, 0x0, 0, 2) -> e
kernel: page fault trap, code 0
pool_prime_page+0x1f6: movq %rax,0x10(%rdx)

Excerpt of register values:

rax = 0xffff800020fe1078
rdx = 0xffffffffffffffff

Backtrace:
pool_prime_page + 0x1f6
pool_get+0x101
ffs_vget+0xdb
ffs_valloc+0xe8
ufs_makeinode+0x6c
ufs_create
VOP_CREATE
vn_open
sys_open
syscall_plain

The place in pool_prime_page where it happens is:

(gdb) list *(pool_prime_page+0x1f6)
0xffffffff803f65d6 is in pool_prime_page (../../../../kern/subr_pool.c:1334).
1329                            pi = (struct pool_item *)cp;
1330    
1331                            KASSERT(((((vaddr_t)pi) + ioff) & (align - 1)) == 0);
1332    
1333                            /* Insert on page list */
1334                            LIST_INSERT_HEAD(&ph->ph_itemlist, pi, pi_list);
1335    #ifdef DIAGNOSTIC
1336                            pi->pi_magic = PI_MAGIC;
1337    #endif
1338                            cp = (caddr_t)(cp + pp->pr_size);

and assembler context:
0xffffffff803f65af <pool_prime_page+463>:       mov    %r15d,%r8d
0xffffffff803f65b2 <pool_prime_page+466>:       lea    0x38(%r13),%rsi
0xffffffff803f65b6 <pool_prime_page+470>:       dec    %ecx
0xffffffff803f65b8 <pool_prime_page+472>:       lea    (%r8,%rbx,1),%rax
0xffffffff803f65bc <pool_prime_page+476>:       test   %rax,%rcx
0xffffffff803f65bf <pool_prime_page+479>:          jne    0xffffffff803f65fc <pool_prime_page+540>
0xffffffff803f65c1 <pool_prime_page+481>:       mov    0x38(%r13),%rax
0xffffffff803f65c5 <pool_prime_page+485>:       test   %rax,%rax
0xffffffff803f65c8 <pool_prime_page+488>:       mov    %rax,0x8(%rbx)
0xffffffff803f65cc <pool_prime_page+492>:        je     0xffffffff803f65da <pool_prime_page+506>
0xffffffff803f65ce <pool_prime_page+494>:       mov    0x38(%r13),%rdx
0xffffffff803f65d2 <pool_prime_page+498>:       lea    0x8(%rbx),%rax
0xffffffff803f65d6 <pool_prime_page+502>:       mov    %rax,0x10(%rdx)
                              >>>>>>>>>>        ~~~~~~~~~~~~~~~~~~~~~~
0xffffffff803f65da <pool_prime_page+506>:       movl   $0xdeadbeef,(%rbx)
0xffffffff803f65e0 <pool_prime_page+512>:       dec    %edi

Here is a dmesg from the machine:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 3.99.15 (MARTINS) #0: Sat Feb 11 14:34:33 CET 2006
	martin@martins.aprisoft.de:/usr/src/sys/arch/amd64/compile/MARTINS
total memory = 4095 MB
avail memory = 3852 MB
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) Processor 248, 2210.29 MHz
cpu0: features: e7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features: e7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu0: features: e7dbfbff<FXSR,SSE,SSE2,LONG,3DNOW2,3DNOW>
cpu0: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu0: L2 cache 1 MB 64B/line 16-way
cpu0: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: calibrating local timer
cpu0: apic clock running at 200 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: AMD Opteron(tm) Processor 248, 2210.19 MHz
cpu1: features: e7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features: e7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu1: features: e7dbfbff<FXSR,SSE,SSE2,LONG,3DNOW2,3DNOW>
cpu1: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu1: L2 cache 1 MB 64B/line 16-way
cpu1: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu1: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 11, 24 pins
ioapic1 at mainbus0 apid 3 (I/O APIC)
ioapic1: pa 0xd8000000, version 11, 4 pins
ioapic2 at mainbus0 apid 4 (I/O APIC)
ioapic2: pa 0xd8001000, version 11, 4 pins
mainbus0: Intel MP Specification (Version 1.4) (nVIDIA   CK804-2P    )
mpbios: bus 0 is type PCI   
mpbios: bus 1 is type PCI   
mpbios: bus 2 is type PCI   
mpbios: bus 8 is type PCI   
mpbios: bus 128 is type PCI   
mpbios: bus 129 is type ISA   
acpi0 at mainbus0
acpi0: using Intel ACPI CA subsystem version 20060113
acpi0: X/RSDT: OemId <PTLTD ,  RSDT  ,06040000>, AslId < LTP,00000000>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
acpi: WARNING: no matching I/O apic for SCI, assuming ioapic0
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
PNP0C0C [ACPI power button device] at acpi0 not configured
PNP0C01 [System Board] at acpi0 not configured
PNP0A03 [PCI Bus] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0100 [AT Timer] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
PNP0800 [AT-style speaker sound] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
PNP0C04 [Math Coprocessor] at acpi0 not configured
PNP0A05 [Generic ACPI Bus] at acpi0 not configured
PNP0501 [16550A-compatible COM port] at acpi0 not configured
PNP0700 [PC standard floppy disk controller] at acpi0 not configured
PNP0303 [IBM Enhanced (101/102-key, PS/2 mouse support)] at acpi0 not configured
PNP0F13 [PS/2 Port for PS/2-style Mice] at acpi0 not configured
_NVRAIDBU at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0A03 [PCI Bus] at acpi0 not configured
PNP0A03 [PCI Bus] at acpi0 not configured
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
NVIDIA nForce4 Memory Controller (miscellaneous memory, revision 0xa3) at pci0 dev 0 function 0 not configured
pcib0 at pci0 dev 1 function 0
pcib0: NVIDIA nForce4 PCI-ISA bridge (rev. 0xa3)
NVIDIA nForce4 SMBus (SMBus serial bus, revision 0xa2) at pci0 dev 1 function 1 not configured
ohci0 at pci0 dev 2 function 0: NVIDIA nForce4 USB Host Controller (rev. 0xa2)
ohci0: interrupting at ioapic0 pin 10 (irq 10)
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: NVIDIA OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 10 ports with 10 removable, self powered
ehci0 at pci0 dev 2 function 1: NVIDIA nForce4 USB2 Host Controller (rev. 0xa3)
ehci0: interrupting at ioapic0 pin 7 (irq 7)
ehci0: BIOS refuses to give up ownership, using force
ehci0: EHCI version 1.0
ehci0: companion controller, 4 ports each: ohci0
usb1 at ehci0: USB revision 2.0
uhub1 at usb1
uhub1: NVIDIA EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub1: 10 ports with 10 removable, self powered
auich0 at pci0 dev 4 function 0: nForce4 AC-97 Audio
auich0: interrupting at ioapic0 pin 12 (irq 12)
auich0: ac97: Analog Devices AD1981B codec; headphone, 20 bit DAC, no 3D stereo
auich0: ac97: ext id 605<AC97_22,AMAP,SPDIF,VRA>
viaide0 at pci0 dev 6 function 0
viaide0: NVIDIA nForce4 IDE Controller (rev. 0xf2)
viaide0: bus-master DMA support present
viaide0: primary channel configured to compatibility mode
viaide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus0 at viaide0 channel 0
viaide0: secondary channel configured to compatibility mode
viaide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus1 at viaide0 channel 1
viaide1 at pci0 dev 7 function 0
viaide1: NVIDIA nForce4 Serial ATA Controller (rev. 0xf3)
viaide1: bus-master DMA support present
viaide1: primary channel wired to native-PCI mode
viaide1: using ioapic0 pin 11 (irq 11) for native-PCI interrupt
atabus2 at viaide1 channel 0
viaide1: secondary channel wired to native-PCI mode
atabus3 at viaide1 channel 1
viaide2 at pci0 dev 8 function 0
viaide2: NVIDIA nForce4 Serial ATA Controller (rev. 0xf3)
viaide2: bus-master DMA support present
viaide2: primary channel wired to native-PCI mode
viaide2: using ioapic0 pin 10 (irq 10) for native-PCI interrupt
atabus4 at viaide2 channel 0
viaide2: secondary channel wired to native-PCI mode
atabus5 at viaide2 channel 1
ppb0 at pci0 dev 9 function 0: NVIDIA nForce4 PCI Host Bridge (rev. 0xa2)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
tlp0 at pci1 dev 4 function 0: Macronix MX98713 Ethernet, pass 0.0
tlp0: interrupting at ioapic0 pin 10 (irq 10)
tlp0: Ethernet address 00:40:05:50:ee:9b
tlp0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX
fwohci0 at pci1 dev 5 function 0: Texas Instruments TSB43AA22/A IEEE 1394 Host Controller (rev. 0x00)
fwohci0: interrupting at ioapic0 pin 12 (irq 12)
fwohci0: OHCI version 1.10 (ROM=1)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:e0:81:00:00:23:65:95
fwohci0: Phy 1394a available S400, 2 ports.
fwohci0: Link S400, max_rec 2048 bytes.
ieee1394if0 at fwohci0: IEEE1394 bus
fwip0 at ieee1394if0: IP over IEEE1394
fwohci0: Initiate bus reset
NVIDIA nForce4 Ethernet (miscellaneous bridge, revision 0xa3) at pci0 dev 10 function 0 not configured
ppb1 at pci0 dev 14 function 0: NVIDIA nForce4 PCIe Host Bridge (rev. 0xa3)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
vga0 at pci2 dev 0 function 0: ATI Technologies Radeon X700 Pro (rev. 0x00)
vga0: WARNING: ignoring 64-bit BAR @ 0x10
vga0: WARNING: ignoring 64-bit BAR @ 0x18
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
ATI Technologies Radeon X700 Pro Secondary (miscellaneous display) at pci2 dev 0 function 1 not configured
pchb0 at pci0 dev 24 function 0
pchb0: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb1 at pci0 dev 24 function 1
pchb1: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb2 at pci0 dev 24 function 2
pchb2: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb3 at pci0 dev 24 function 3
pchb3: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
pchb4 at pci0 dev 25 function 0
pchb4: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb5 at pci0 dev 25 function 1
pchb5: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb6 at pci0 dev 25 function 2
pchb6: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb7 at pci0 dev 25 function 3
pchb7: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pmsprobe: reset error 5
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
pcppi0: attached to attimer0
ioapic0: enabling
ioapic1: enabling
ioapic2: enabling
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
ieee1394if0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
ieee1394if0: bus manager 0 (me)
auich0: measured ac97 link rate at 48001 Hz, will use 48000 Hz
audio0 at auich0: full duplex, mmap, independent
raidattach: Asked for 8 units
Kernelized RAIDframe activated
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 1: <LITE-ON DVD SOHD-167T, , 9S19> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(viaide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
ehci0: handing over low speed device on port 2 to ohci0
uhub1: port 2, device disappeared after reset
uhidev0 at uhub0 port 2 configuration 1 interface 0
uhidev0: Microsoft Microsoft IntelliMouse M-BM-. with IntelliEye, rev 1.10/1.00, addr 2, iclass 3/1
ums0 at uhidev0: 3 buttons and Z dir.
wsmouse0 at ums0 mux 0
wd0 at atabus2 drive 0: <WDC WD1600JD-00HBB0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(viaide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
wd1 at atabus4 drive 0: <WDC WD740GD-00FLA2>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 70911 MB, 144073 cyl, 16 head, 63 sec, 512 bytes/sect x 145226112 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(viaide2:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
wd2 at atabus5 drive 0: <WDC WD740GD-00FLA2>
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 70911 MB, 144073 cyl, 16 head, 63 sec, 512 bytes/sect x 145226112 sectors
wd2: 32-bit data port
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd2(viaide2:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
Searching for RAID components...
boot device: wd0
root on wd0a dumps on wd0b
mountroot: trying lfs...
mountroot: trying ffs...
root file system type: ffs
cpu1: CPU 1 running
init: copying out path `/sbin/init' 11
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

>How-To-Repeat:
(This may well be amd64 specific, but since the crashes happen in generic
kernel code, I chose category "kern" instead. Quentin Garnier tried to 
reproduce it on a amd64 machine with 8GB RAM and it worked fine for him, so
there is some part special to my setup probably.)

On this machine it's reproducable by 

 - boot a GENERIC.MP kernel
 - cd /tmp && cvs -d anoncvs.netbsd.org:/cvsroot co src

>Fix:
no idea