Subject: Re: Dump command crashes machine
To: None <port-pmax@netbsd.org>
From: Jamie Scuglia <jamie@cs.monash.edu.au>
List: port-pmax
Date: 12/02/1999 10:02:28
"Aaron J. Grier" <agrier@poofy.goof.com> writes:

> On Mon, Nov 22, 1999 at 05:06:12PM +1100, Jamie Scuglia wrote:
> 
> => The /var/log/messages file has an error like this when we manage to
> => crash the machine:
> =>  
> => Nov 22 16:20:21 savecore: reboot after panic: panic("Mem error interrupt");

> Sounds like bad or loose memory...  I had some in my /240 that didn't
> normally give me any problems, but as soon as I tickled it hard enough
> (by compiling the world or doing some other memory-intensive process,)
> my machine would reboot with the same error you're seeing.  If you can scroll
> back in your logs, you'll likely see messages like "CPU memory read ECC
> error at 0x04403c98" which will tell you which module needs to be
> wiggled or removed.  (The output of 'cnfg 3' from the PROM prompt is
> helpful here.)

The funny thing is that it's not just happening on one machine we have,
but many.  I thought it would be too much of a co-incidence to think
that all these machines have bad memory somewhere.  All worked fine
under Ultrix and were subjected to much heavier load.

Anyway, re-seating the memory cards didn't help.  On some systems,
we immediately get a segmentation fault when running "dump".  After
rebooting the machine, it behaves better, but then still crashes
the machine later for certain largish filesystem dumps.  It's almost
like the "dump" command is responsible for this.

We get many different errors like these few:

	trap: address error (store) in kernel mode
	panic: utlbmod: invalid segmap

	trap: TBL miss (load or instr-fetch) in kernel mode

	CPU memory read timeout error at 0x027361bc
	panic: panic("Mem error interrupt");

We've used "tar" to tar up a partition that has one Gb of data with
no problems.

Is anyone using "dump" to backup their machines?  Or what are people
using?

- Jamie.