Subject: Re: what's this machine check mean?
To: None <port-alpha@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-alpha
Date: 04/18/2000 02:27:31
>> In view of the latter, I suspect thermal problems - especially since
>> the 3.3V regulator heatsink gets hot enough to concern me.  I'm
>> pondering ways to cool things more effectively.
> Oh yes!  Cool the bugger.  I've seen very weird things on a NoName
> with an overheating CPU and/or regulator.

I now think it's *not* thermal. :-/

I managed to lay a fan on top of the CPU and regular heatsinks so it
blows air down and through them.  Keeps them both cool enough that I'm
not worried.

And the machine checks still happen.  But interestingly...

- sitting idle at a (single-user) shell: it lasts a long time (>1hr).

- sitting doing "while :; do :; done": it lasts a long time.

- find / -type f -print | while read fn; do echo "$fn"; cat "$fn" > /dev/null; done
  falls over before it gets into /usr.

- Reading data from a wide-open TCP connection (another host on the
  house (10Mb) Ethernet blasting /dev/zero at it) and throwing it away:
  falls over within minutes.

And every last one of the machine checks I've seen has had an address
in about the same range as the two that were in lca_mem_read_1.  I
can't believe this is coincidence; I believe there is something flaky -
on board or chip, I don't know how to tell - with the I/O subsystem.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B