Subject: Re: memtest for alpha?
To: None <port-alpha@netbsd.org>
From: Paul Mather <paul@gromit.dlib.vt.edu>
List: port-alpha
Date: 10/23/2002 12:15:02
On Wed, Oct 23, 2002 at 03:40:20PM +0200, Riccardo.Veraldi@fi.infn.it wrote:

=> if I Were you I would use the test utility from the SRM console.
=> When I had problem with memory it always came out doing a test from SRM
=> console.

This prompts me to ask a question: I use a DEC 3000/500S that uses ECC
memory.  One or more of the sticks in there is (or is intermittently)
bad, but not fatally so.  I get "Warning: received processor
correctable error" messages in my console, usually whenever I am
stressing the VM system (by processing large files, or doing large
builds, etc.).

I am confident that the "Warning: ..." messages are reporting ECC
errors that have been corrected, because when Tru64 was installed on
the same hardware it explicitly stated as such (although they happened
far less frequently---presumably because NetBSD utilises available
memory better, and hence is more likely to touch the "bad" RAM more
often).

The trouble is that whenever I do a "test mem" from the SRM prompt,
the memory passes all the checks!

Given I have a DEC 3000/600 from which I could theoretically swap
memory, how do I go about locating which memory module in the DEC
3000/500S is bad??  Is it possible to determine from the machine check
info that yields the "Warning: ..." message which module is bad?
Tru64 used to display a lot more additional info when it reported ECC
faults.  (Alas, Tru64 exists no more on the system, so I can't run it
to try and see this extra info.)  I looked at the NetBSD source code,
and the machine check info is only displayed for unexpected machine
checks.

On an unrelated note (but related perhaps to the original thread), my
DEC 3000/300 at home actually fails the LLSC memory test, but
otherwise works flawlessly.  (The LLSC instruction is used for
multi-CPU environments, right, so the LLSC test not passing is
unlikely to affect a uniprocessor machine?)  Anyway, I guess what I'm
trying to say is that the failing of "test mem" at the SRM prompt is
no guarantee of an unstable NetBSD system (though passing is
preferred).

Cheers,

Paul.

e-mail: paul@gromit.dlib.vt.edu

"Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid."
        --- Frank Vincent Zappa