port-alpha: Re: LLSC MEM test?

Subject: Re: LLSC MEM test?
To: None <port-alpha@netbsd.org>
From: Paul Mather <paul@gromit.dlib.vt.edu>
List: port-alpha
Date: 06/07/1999 13:57:40
On Mon, 7 Jun 1999, Ross Harvey wrote:

=> Paul Mather <paul@gromit.dlib.vt.edu> writes:
=> > >>> test mem
=> > T-STS-MEM - LLSC Test: Addr 00800000 FWD Wr 00000000
=> > ? T-ERR-MEM - stl_c bcache miss with victim at Addr: 00d91bb8
=> > T-STS-MEM - Uncorrected Error count = 1
=> > ? T-ERR-MEM - FAILED status = 20 test Init addr = 00d91bb8
=> > ?? 810       MEM 0x0020
=> > [ ... ]
=> 
=> I don't have much insight into what the FW is doing or even into what that
=> failure message really means, but the apparent contradictions that you found
=> hard to resolve aren't too hard to speculate on... :-)
=> 
=> 	1. You don't actually "*know*" the SIMMs aren't bad, you just have
=> 	   a data point that says they tested good. Perhaps the 3000's are
=> 	   at slightly different rev or eco levels, or some critical part
=> 	   just happens to be faster on one. (There is a huge gap between
=> 	   min and max prop delays, and the 3000 is built with very low
=> 	   levels of integration, which means high relative uncertainty
=> 	   between different siganls.)

Actually, I omitted this from the original message, but I do have
additional data points.  The same four 8MB SIMMs pass "test mem" in the
DEC3000/300LX it actually belongs to, and from which it is currently "on
loan" to the machine in which the "test mem" fails.  (So, in other
words, it passes "test mem" in two other DEC3000/300LXs.)  Also, two
32MB SIMMs from another 3000/300LX fail "test mem" in the machine I
reported about, but pass in the machine from which they were taken.

So, there is > 1 data point to suggest the SIMMs are okay.

=> 	2. Like Chris said, maybe it's the bcache and not the DRAM.

Is the bcache on the mainboard, or on the CPU card?

=> 	3. Perhaps the FW turns off ECC for the purposes of the test.
=> 	   I would have. So, with NetBSD running, the errors are corrected.
=> 
=> 	4. I would bet the framebuffer RAM error is independant, but not
=> 	   necessarily. ECC won't deal with a completely stuck data line,
=> 	   not unless the FW is really smart and manages to turn on the
=> 	   `correct it without logging it' mode that at least some of the
=> 	   alpha HW has.

I don't mind the test failing per se.  I was just wondering if there
would be any "hidden" effects on the operation of NetBSD that was
serious enough to warrant doing something about.  Also, the framebuffer
RAM not working doesn't bother me, as it isn't supported by NetBSD
anyway.

But, like I said, everything *seems* to be working in NetBSD, so I guess
everything is grand.

Cheers,

Paul.

e-mail: paul@gromit.dlib.vt.edu

"I don't live today; maybe tomorrow..."
	--- James Marshall Hendrix