Subject: Re: Memorybroblems... not!
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Kim G. S. OEyhus <kim@iq.pvv.ntnu.no>
List: port-pmax
Date: 04/29/1998 09:17:14
> ... A temperature-dependent bug like this just reeks of a hardware
> fault: either dry or failing solder joints, or a component at the
> ragged edge of failure.
> 
> I'd wait for the temperature to rise, and then spray specific parts of
> the board with "freeze" (spraycans containing compressed refrigerant).
> With luck, you can narrow the problem down to a single area or a
> single chip.  I'd suspect the cache, or anything from the CPU to the
> memory-bus parity glue.
> 
> Caution: This may be tricky if you cant access the underside of the
> motherboard. Also, the thermal stress may make the problem worse.  


It is now done. It was the processor. Actually, the thermal stress
appears to have improved its functioning, as I had to actually isolate
the chips thermally to get the bug back.

Spray-cooling chips is fun. I cooled the processor so that carbondioxide
condensed on it, and then water, so it had a layer of ice. Somewhat
brutal, but considering that the machine is worthless if I don't 
succeed, it is justified.

Anyway, the MIPS processor did not have a heatsink, so I have just
bought one, with Electrolube silicon heat-transfer paste.
This should work, since the fault disappears with very little
cooling.


The debugging of the core dumps did not give me any usable information
as to what causes the fault. gdb did not gave me any useful information
as to what happened, or where the code was. Maybe I could have found it
if I used a couple more evenings, but my time is limited. I got the
impression that gdb on pmax/MIPS is somewhat weak...

Kim0