port-i386: Re: Physical memory tests?

Subject: Re: Physical memory tests?
To: Terry Moore <tmm@mcci.com>
From: Don Lewis <Don.Lewis@tsc.tdk.com>
List: port-i386
Date: 08/03/1996 20:46:18

On Aug 3,  9:11pm, Terry Moore wrote:
} Subject: Re: Physical memory tests?
} 2)  It is not completely clear that ability to correct any single-bit
} error (in a 64-bit word) and detect any two-bit error give a longer
} MTBF when compared to per-byte parity.  Consider: any error that 
} would be corrected by ECC would be detected by parity.

Yes, but a detected parity error will either kill a process or cause
a system panic, depending on whose memory was in error.  A corrected
ECC error should just get logged and life should go on.  Of the
workstations I administer, the two with 48 MB of parity memory panic
about once a year with a memory parity error.  The two with 64 MB of
ECC memory just chug along until either the power fails long enough
to exhaust the UPS or I shut them down to install patches or fiddle
with hardware.

} Since the
} described situation was undetected errors, and multi-bit errors tend
} to come in multiples greater than 2 (they're usually due to connector faults
} or broken multi-bit silicon; common widths are x1 and x4), it is not
} even clear that ECC would have helped in this situation.

True, but such faults tend to be very obvious are pretty easy to track
down given the proper tools.  They also have at least as good a chance
of being detected (but not corrected) by ECC as by parity.  The fact
that the errors were not detected at all makes me suspect that non-parity
SIMMs were used, so no parity or ECC detection was possible.

			---  Truck