Subject: Re: IPC cras
To: Simon Raahauge DeSantis <xiamin@ghostpriest.rakis.net>
From: Eduardo Horvath <eeh@turbolinux.com>
List: port-sparc
Date: 04/24/2000 12:58:30
On Mon, 24 Apr 2000, Simon Raahauge DeSantis wrote:

> I've recently dragged out an old IPC of mine to see if I can get it working
> again. It would crash mysteriously and I thought it was memory problems
> (this happened also under SunOS 4.1.4). Now I've got some new SIMMs for it
> and I want to track down the error. Is there any way to tell from the panic
> message what SIMM or SIMM bank the error occured in, or do I have to go
> through all twelve 1mb SIMMs trying to reproduce this sporadic crash?
> Here's the output from a recent crash on trying to shut down:
> Done running shutdown hooks.
> Apr 24 12:34:07 emperor syslogd: exiting on signal 15
> data fault: pc=0xf00c08bc addr=0x140 ser=80<INVAL>
> panic: kernel fault
> syncing disks... done

I've found that this sort of error is unlikely to be caused by a bad
SIMM.  Looks like the kernel is dereferencing a NULL pointer.  Since RAM
usually has ECC or at least parity, a bad SIMM should cause a memory fault
not a data fault.  If it really is a H/W problem then it's more likely to
be the CPU or cache RAM than main memory.  It can also be caused by some
sort of kernel bug.  Determinig the root cause can be a difficult
process.  If you're really interested I can recommend _Panic!_ by Chris
Drake and Kimberly Brown (it's available from Sun).

Eduardo Horvath