Subject: Whatever happened to legendary NetBSD reliability?
To: NetBSD/Alpha <port-alpha@netbsd.org>
From: Bill Dorsey <dorsey@lila.com>
List: port-alpha
Date: 07/23/2001 22:37:45
Hi,

Since upgrading to NetBSD 1.5 (and more recently NetBSD 1.5_BETA2) my PWS
has yet to make it to 30 days of uptime.  Prior to that with the 1.4 kernel
I had NEVER seen a kernel panic, but now I get one every 1-3 weeks.  Still
more reliable than Windoze, but not by much.  Here's the output from
yesterday's
panic on the 1.5_BETA2 kernel:

/netbsd: fatal kernel trap:
/netbsd:
/netbsd:     trap entry = 0x2 (memory management fault)
/netbsd:     a0         = 0x7f7f7c80810fa248
/netbsd:     a1         = 0x1
/netbsd:     a2         = 0x0
/netbsd:     pc         = 0xfffffc00003cab1c
/netbsd:     ra         = 0xfffffc00003ca908
/netbsd:     curproc    = 0xfffffc00018cef00
/netbsd:         pid = 75, comm = syslogd
/netbsd:
/netbsd: panic: trap

panic: trap
Stopped in syslogd at cpu_Debugger + 0x4
panic() @ panic + 0xfc
trap() @ trap + 0x51c
XentMM() @ XentMM + 0x20
--- memory management fault (from IPL 0) ---
pollscan() @ pollscan + 0x7c
sys_poll() @ sys_poll + 0x228
syscall() @ sys_call + 0x1dc
Xentsys() @ Xentsys + 0x50
--- Syscall(209, netbsd.sys_poll) ---
--- user mode ---

And here's the first few lines of output from dmesg:

Digital Personal WorkStation 533au, 531MHz
8192 byte page size, 1 processor.
total memory = 256 MB
(2264 KB reserved for PROM, 253 MB used by NetBSD)
avail memory = 230 MB
using 1637 buffers containing 13096 KB of memory
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-0 (unknown minor type 0)
cpu0: Architecture extensions: 1<BWX>
cia0 at mainbus0: DECchip 2117x Core Logic Chipset (Pyxis), pass 1
cia0: extended capabilities: 111<WLEN,MWEN,BWEN>
cia0: using BWX for PCI config access
cia0: WARNING: Pyxis pass 1 DMA bug; no bets...
pci0 at cia0 bus 0
pci0: i/o space, memory space enabled
de0 at pci0 dev 3 function 0
de0: interrupting at dec 550 irq 0
de0: DEC 21142 [10-100Mb/s] pass 1.1
de0: address 00:00:f8:75:41:e3
de0: setting full duplex.
de0: enabling Full Duplex 100baseTX port
de0: setting full duplex.
de0: setting full duplex.
de0: setting full duplex.
de0: setting full duplex.
pciide0 at pci0 dev 4 function 0: CMD Technology PCI0646 (rev. 0x01)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
[...]

If it helps, the panic occured while the machine was experiencing
heavy network traffic and a moderate CPU load (load average around
2).

So I'm going to upgrade to 1.5.1 presently in the hopes that it will
be more reliable for me.  Still, it would seem unlikely that the
problem got fixed in the short amount of time between the BETA2
release and the 1.5.1 release.

Any suggestions?

- Bill Dorsey