Subject: NetBSD/amd64 4.0 panic
To: None <current-users@netbsd.org>
From: Martti Kuparinen <martti.kuparinen@iki.fi>
List: current-users
Date: 12/20/2007 11:23:14
Hi,

Our production server (NetBSD/amd64 4.0) just crashed. This Dell PowerEdge 2900 
is running

NetBSD 4.0 (P130) #0: Mon Dec 17 10:37:52 EET 2007
         root@p130.mydomain.net:/usr/src/sys/arch/amd64/compile/P130
total memory = 4095 MB
avail memory = 3939 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU            5140  @ 2.33GHz, 2327.58 MHz
cpu0: features: bffbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features: bffbfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,B20,DS,ACPI,MMX>
cpu0: features: bffbfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2: 4e3bd<SSE3,MONITOR,DS-CPL,VMX,EST,TM2,xTPR>
cpu0: features3: bffbfbff<SYSCALL/SYSRET,XD,EM64T>
cpu0: L2 cache 4 MB 64B/line 16-way
cpu0: calibrating local timer
cpu0: apic clock running at 332 MHz
cpu0: 64 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: Intel(R) Xeon(R) CPU            5140  @ 2.33GHz, 2327.50 MHz
cpu1: features: bffbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features: bffbfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,B20,DS,ACPI,MMX>
cpu1: features: bffbfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: features2: 4e3bd<SSE3,MONITOR,DS-CPL,VMX,EST,TM2,xTPR>
cpu1: features3: bffbfbff<SYSCALL/SYSRET,XD,EM64T>
cpu1: L2 cache 4 MB 64B/line 16-way
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 20, 24 pins
ioapic0: misconfigured as apic 0
ioapic0: remapped to apic 2
ioapic1 at mainbus0 apid 3 (I/O APIC)
ioapic1: pa 0xfec80000, version 20, 24 pins
ioapic1: misconfigured as apic 0
ioapic1: remapped to apic 3
ioapic2 at mainbus0 apid 4 (I/O APIC)
ioapic2: pa 0xfec83000, version 20, 24 pins
ioapic2: misconfigured as apic 0
ioapic2: remapped to apic 4
acpi0 at mainbus0: Advanced Configuration and Power Interface
acpi0: using Intel ACPI CA subsystem version 20060217
acpi0: X/RSDT: OemId <DELL  ,PE_SC3  ,00000001>, AslId <DELL,00000001>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
ACPI-Fast 24-bit timer
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
PNP0A03 [PCI/PCI-X Host Bridge] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
PNP0C04 [Math Coprocessor] at acpi0 not configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0C01 [System Board] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
attimer1 at acpi0 (PNP0100): AT Timer
attimer1: io 0x40-0x5f irq 0
PNP0501 [16550A-compatible COM port] at acpi0 not configured
PNP0501 [16550A-compatible COM port] at acpi0 not configured
PNP0C01 [System Board] at acpi0 not configured
PNP0C01 [System Board] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0103 [HPET Timer] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
...
mfi0 at pci9 dev 14 function 0: ioapic2 pin 14 (irq 5)
mfi0: logical drives 1, version 5.1.1-0040, 256MB RAM
scsibus0 at mfi0: 64 targets, 8 luns per target
...
sd0 at scsibus0 target 0 lun 0: <DELL, PERC 5/i, 1.03> disk fixed
sd0: fabricating a geometry
sd0: 232 GB, 237824 cyl, 64 head, 32 sec, 512 bytes/sect x 487063552 sectors


 From the crash dump I got this:

# dmesg -M /var/crash/netbsd.8.core
...
sd0(mfi0:0:0:0): unable to allocate scsipi_xfer
sd0(mfi0:0:0:0): unable to allocate scsipi_xfer
sd0(mfi0:0:0:0): unable to allocate scsipi_xfer
panic: bnx0: Double mbuf allocation failure!
Begin traceback...
?() at 0xffff8000099f7034
uvm_fault(0xffffffff80b6a3e0, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff804c9793 cs 8 rflags 10246 cr2  7 cpl 7 rsp 
ffff8000549da928
panic: trap


gdb is not able to say anything useful:

# gdb netbsd.gdb
(gdb) target kcore /var/crash/netbsd.8.core
panic: %s: Double mbuf allocation failure!
#0  sddump () at ../../../../dev/scsipi/sd.c:1496
1496    {
(gdb) bt
#0  sddump () at ../../../../dev/scsipi/sd.c:1496
can not access 0xa, invalid translation (invalid PTE)
can not access 0xa, invalid translation (invalid PTE)
can not access 0xa, invalid translation (invalid PTE)
can not access 0xa, invalid translation (invalid PTE)
#1  0xffffffff804cd4f9 in dumpsys ()
     at ../../../../arch/amd64/amd64/machdep.c:840


Anyone else having problems with their new 4.0 servers?

Martti