Subject: Re: Randomly crashing DECstation 5000/125 with NetBSD 1.5
To: Alexander Schreiber <alexander.schreiber@informatik.tu-chemnitz.de>
From: Chris Tribo <ctribo@del.net>
List: port-pmax
Date: 02/12/2001 00:25:26
on 2/11/01 7:54 PM, Alexander Schreiber at
alexander.schreiber@informatik.tu-chemnitz.de wrote something like:

> There are three modes of crash:
> 1. System drops back to the PROM prompt from which it reboots automatically
> (I set the haltaction parameter to ''r'')

    Probably an address/pointer mish mash from having mixed sized RAM chips.

<snip of other weird behaviors)

> I strongly suspect this to be a hardware problem and I hope to find some
> people here with some knowledge of these machines - finding good documentation
> for such a machine seems to be quite a challenge unfortunately :-(

    Yes, there isn't any online AFAICT for any machine newer than the
5000/200.

> Point 3 leads me to believe that maybe this is a heat problem. But so far
> the only really hot part in the system seems to be CPU module.

    How hot? Hot enough that you can't leave your finger on the heat sink
for more than a second? My 5000/200 runs nearly that hot (something like
115-120 degrees Fahrenheit), and hasn't had any problems when it was in
service. I assume you have the case on normally, and all the fans are
spinning. Ultrix has a function to detect CPU overheats, but it hasn't been
implemented in NetBSD AFAIK.

> Oh - and I found out that, opposed to the NetBSD documentation about this,
> you _can_ mix 2 MB and 8 MB memory modules. I put in 8 2 MB modules first,

    AFAIK, It does not support mixed memory modules. The hardware in the
machine itself does, but NetBSD does not (yet) have this implemented, even
though the code has been written more than once. I would try removing the
mixed modules and see if it still occurs. ATM, the NetBSD kernel can't check
for what size modules are in what slots, so it probably ends up assuming
that all the chips are the same size. (which they are not), so when it gets
to a different sized chip, the addressing goes screwy.

> then added 2 8 MB modules and ended up with 20 MB, the 8 MB modules
> obviously being addressed as 2 MB ones. Not very efficient, but 20 MB
> still looked better to me than 16 MB (currently 8x 2 MB is in the
> machine - those also cannibalized from the dead one, the 2x 8 MB came
> originaly with this DS5000/125).

    Did you run "test" (at least once) at the >> to verify that the chips
and hardware is passing their self tests consistently? That *should*
eliminate bad RAM as a culprit since it has a fairly good RAM test.




    Chris

-- 

Murphy was an optimist.