Subject: panic: kernel fault
To: None <port-sparc@NetBSD.ORG>
From: der Mouse <mouse@Collatz.McRCIM.McGill.EDU>
List: port-sparc
Date: 07/14/1995 10:19:05
Once I could write off as coincidence.  But this is the second time
this has happened, under almost exactly the same circumstances.

data fault: pc=f801aea8 addr=1004 ser=80<INVAL>
panic: kernel fault
Stopped at      0xf80e1e9c:     jmpl           [%o7 + 0x8], %g0
db> 

Each time, this happened when I ran ps, and it crashed partway through
printing the output.  (I was running under window, so it may have
happened when ps exited or some such.)

Here's ddb trace output from the first crash, with the "at" addresses
translated to symbolic form and the instruction thereat disassembled
(with Sun adb):

db> trace
?(f85c6900, 80, 1004, f801aea8, 4000c0, f9692b90) at 0xf80e0650
	_mem_access_fault+0x27c: call _panic
?(3, f9692fb0, 267c8, 0, 0, 15) at 0xf80063fc
	normal_mem_fault+0x28: call _mem_access_fault
?(0, 0, 100000, f801b360, f9692e98, 1000) at 0xf800670c
	softtrap+0x154: call _syscall
?(700, f9692e98, 0, f80b30bc, f85d1d00, a) at 0xf80b30d8
	_swread+0x1c: call _physio
?(f9692e08, f8048708, f8586100, f85c6500, 0, f85c64f8) at 0xf80487dc
	_spec_read+0xd4: jmpl %o3,%o7
?(f9692e08, f80a64e4, f80fa800, 5, f85f9900, f85e09c0) at 0xf80a6518
	_ufsspec_read+0x34: jmpl %o1,%o7
?(f86067c0, f9692e98, f85f9900, f80417a4, 4f000, 15) at 0xf804185c
	_vn_read+0xb8: jmpl %o1,%o7
?(9, f9692f28, f9692f20, f8023ad0, 0, 15) at 0xf8023b90
	_read+0xc0: jmpl %o3,%o7
?(3, f9692fb0, 267c8, 0, 0, 15) at 0xf80e0a64
	_syscall+0x1ec: jmpl %o3,%o7
?(5, 4f000, 1000, 0, ffffffff, 1000) at 0xf800670c
	softtrap+0x154: call _syscall
db> 

The second crash was very similar; in particular, the return addresses
in the call stack were identical.  Here's the output from that crash:

data fault: pc=f801aea8 addr=1004 ser=80<INVAL>
panic: kernel fault
Stopped at      0xf80e1e9c:     jmpl           [%o7 + 0x8], %g0
db> trace
?(f85c4c00, 80, 1004, f801aea8, 4000c4, f96a4b90) at 0xf80e0650
?(3, f96a4fb0, 267c8, 0, 0, 18) at 0xf80063fc
?(0, 0, 100000, f801b360, f96a4e98, 1000) at 0xf800670c
?(700, f96a4e98, 0, f80b30bc, f85ca700, 0) at 0xf80b30d8
?(f96a4e08, f8048708, f8586100, f8609800, 0, f86097f8) at 0xf80487dc
?(f96a4e08, f80a64e4, f80fa800, 5, f860f880, f859e880) at 0xf80a6518
?(f86083c0, f96a4e98, f860f880, f80417a4, 53000, 18) at 0xf804185c
?(9, f96a4f28, f96a4f20, f8023ad0, 0, 18) at 0xf8023b90
?(3, f96a4fb0, 267c8, 0, 0, 18) at 0xf80e0a64
?(5, 53000, 1000, 0, ffffffff, 1000) at 0xf800670c
db> 

The second time, the kernel took a coredump into the swap area, but
savecore hung hard trying to save it, so I don't have it to inspect.
While it does not appear to be repeatable at will (ps usually runs to
completion just fine), it does appear to happen often enough that
telling me "look at <foo> when it happens" would be useful.

Any ideas, anyone?  The kernel is built from sources supped just a
couple of days ago, with four minor patches: (1) tickadj added to
kernfs, (2) count increased in TIME_WAIT() in arch/sparc/dev/dmavar.h,
(3) the rcons code tweaked so it doesn't come up in reverse video, and
(4) kern_exec.c patched so that set-id bits aren't disabled by process
tracing if the tracing process is root.

					der Mouse

			    mouse@collatz.mcrcim.mcgill.edu