Subject: Fatal Kernel Trap in XentMM()
To: Port-Alpha2 <port-alpha@mail.netbsd.org>
From: Bill Dorsey <noyb@lila.com>
List: port-alpha
Date: 06/07/2001 09:11:03
Hi,

This is the third time my miata running NetBSD 1.5 has crashed with a
memory management fault so I went ahead and recorded all the details in
the hopes it might be useful for debugging.

The crash seems to occur at random times.  My system is never totally
idle with several programs using about 20% of the CPU at most times.
This most recent crash occured at 3:18 in the morning.  For what it's
worth, I usually have about 180+ megabytes of free memory according to
top(1) most of the time.

After rebooting, the machine runs flawlessly for 1-2 weeks before I
see the next crash.  It has become a problem in the last couple of
months which coincides with my beginning to use the machine to
perform various computational tasks that use up to 20% of the CPU
around the clock (they are I/O intensive, but they don't use a lot
of memory).

First, the system:

NetBSD 1.5 (SPITFIRE) #5: Mon Mar 26 14:13:02 PST 2001
    dorsey@spitfire:/usr/src/sys/arch/alpha/compile/SPITFIRE
Digital Personal WorkStation 533au, 531MHz
8192 byte page size, 1 processor.
total memory = 256 MB
(2264 KB reserved for PROM, 253 MB used by NetBSD)
avail memory = 230 MB
using 1637 buffers containing 13096 KB of memory
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-0 (unknown minor type 0)
cpu0: Architecture extensions: 1<BWX>
cia0 at mainbus0: DECchip 2117x Core Logic Chipset (Pyxis), pass 1
cia0: extended capabilities: 111<WLEN,MWEN,BWEN>
cia0: using BWX for PCI config access
cia0: WARNING: Pyxis pass 1 DMA bug; no bets...
pci0 at cia0 bus 0
pci0: i/o space, memory space enabled

BTW, the CPU is not being overclocked.

And here's the information from the debugger from the latest crash:

fatal kernel trap:

    trap entry = 0x2 (memory management fault)
    a0         = 0x7f7f7c80810edba0
    a1         = 0x1
    a2         = 0x0
    pc         = 0xfffffc00003e27a4
    ra         = 0xfffffc00003e0088
    curproc    = 0xfffffc0005f3b688
        pif = 34373, comm = ftpd

panic: trap
Stopped in ftpd at cpu_Debugger + 0x4:  ret zero,(ra)

db> trace

cpu_Debugger() at cpu_Debugger() + 0x4
panic() at panic() + 0xfc
trap() at trap() + 0x51c
XentMM() at XentMM() + 0x20
--- memory management fault (from ipl 0) ---
getsock() at getsock() + 0x24
sys_accept() at sys_accept() + 0xa8
syscall() at syscall() + 0x1dc
XentSys() at XentSys() + 0x50
--- syscall (30, netbsd.sys_accept) ---
--- user mode ---

db> sync

I also have a copy of the kernel core files from this and the
previous crash (which was also a memory management fault but
the trace was different below the --- memory management fault
--- line.  They are rather large files (45-48 megabytes each),
but I'd be willing to upload them somewhere if they would be
useful for debugging.

I have been contemplating upgrading to BETA_2 in the hopes that
it might solve this problem, but since we're presumably so
close to the 1.5.1 release, I have been holding out.  I thought
I'd post the details of this just in case it's something new
that hasn't been fixed yet.

--
Bill Dorsey