Subject: Re: Memory?
To: Kim G. S. \yhus <kim@iq.pvv.ntnu.no>
From: Simon Burge <simonb@telstra.com.au>
List: port-pmax
Date: 04/17/1998 18:05:48
On Fri, 17 Apr 1998 09:17:14 +0200 (MET DST)  Kim G. S. \yhus wrote:

> > Kim G. S. \yhus wrote:
> > > However, primes crashes, gcc get internal errors, and gzip often
> > > fail decompressing, and ftp often gives faulty transfer. All
> > > sufficiently big systems fail.
> > 
> >   Yeah, my 3100 dies mysteriously as well.  It panics occasionally
> > (usually right before starting sh when booting single-user) with
> > an error that I don't have handy at the moment (something like
> > "ktlbmiss").  I haven't investigated yet, though.
> 
> Hmmm. That could be a memory problem too. Perhaps some of the gurus
> could make a kernel that reported memory errors on the console? Just
> so we know what the problem is. It would be even better if it printed
> the address, or memory bank, so the faulty memory could be removed.
> And, how do one change kernels on pmax anyway? I tried changing
> "/netbsd", from 1.2 to 1.3, but that stopped booting altogether.

>From looking at the code, parity errors should be detected and cause a
panic on the non error correction machines (see below), but I've never
seen one here fail...  The code for the 3100 memory error detection is
in /sys/arch/pmax/pmax/dec_3100.c

> It could of course be interesting to make the memory correcting system
> work, but it is the chicken and egg problem: no chichen & no egg,
> can't compile, probably because of faulty memory, and therefore
> can't check if it is a memory problem, or make kernel errordetecting.

Of the machines that run NetBSD, only the 5000/2xx series machines have
error correction.  The others (the 2100, 3100, 5000/xx and 5000/1xx)
only have parity.  On the 5000/2xx series machines, error correction
does "work" - it's done in hardware.  The kernel prints a message on the
console and continues on.  (Yes, I've got a machine that happily prints
these all the time, and keeps on chugging.)

Simon.