Subject: Re: Memory Fault Reported By Kernel
To: None <port-pmax@netbsd.org>
From: Nick Boyce <nick@glimmer.demon.co.uk>
List: port-pmax
Date: 09/23/2000 21:45:33
[following up my own posting]

On Fri, 22 Sep 2000 02:43:21 +0100, Nick Boyce wrote:

> I've just converted a DEC 5240 from Ultrix 4.4 to NetBSD 1.4.2, and in
> the first 24 hours we've had one occurrence of the following report in
> /var/log/messages :
>=20
>  Sep 21 10:20:01 rccnx4 /netbsd: CPU memory read ECC error at
> 0x00270824
>  Sep 21 10:20:01 rccnx4 /netbsd:    ECC 0xd39cdd0c

Thanks everybody for the general confirmation that this represents a
recovered single-bit fail in ECC memory, and that it's worth trying to
reseat the mem modules.

This machine's system cabinet has never been opened since it went
live, so I wonder how much cruft I'll find inside ...

I'm just about to call in at work (it's Saturday, so I'm bound to get
promotion for this ;-) and try leaving the Prom mode TEST command
running for the rest of the weekend (thanks to Brian Hechinger).  I
went and had a look at the list archive message mentioned by Aaron
Grier, and I'll see if I can work out which module the above messages
relate to.  It sounds from Jared Smolens' message as though the patch
mentioned by Aaron has indeed made it into the upcoming 1.5 release.

I'm slightly bemused by the two addresses reported in my one incident
above : 0x00270824 and 0xd39cdd0c.  These are not close - is one the
failing address, and one the address of the instruction that attempted
the memory access ?

I'll also reboot the Ultrix system (I left it there on a different
disk) and see whether the uerf log shows the same kind of reports, as
mentioned by Uwe Lienig.

Thanks again everyone.

Nick Boyce
Bristol, UK
--
Quantum Mechanics: The dreams stuff is made of