Subject: Isolating NMI/memory problem with old SPARCserver 20
To: None <port-sparc@NetBSD.ORG>
From: Greg Earle <earle@isolar.DynDNS.ORG>
List: port-sparc
Date: 08/06/2005 10:46:14
This is really a hardware issue, but since the machine is
running NetBSD/SPARC 1.6.1 ...

I'm starting to get semi-frequent NMI memory errors on my old
SPARCserver 20 (with Ross HyperSPARC 180) warhorse:

Aug  6 09:58:26 isolar /netbsd: Async registers (mid 0): 
afsr=2000<SE,AFA=0>; afva=0x00
Aug  6 09:58:26 isolar /netbsd: cpu0: NMI: system interrupts: 
10080000<VME=0,SBUS=0,T,M>
Aug  6 09:58:26 isolar /netbsd: memory error:
Aug  6 09:58:26 isolar /netbsd:         EFSR: 5231<CE,DW=3,SYNDROME=52>
Aug  6 09:58:26 isolar /netbsd:         MBus transaction: 
8fe38d30<VAH=0,TYPE=3,SIZE=5,C,VA=8e,S,MID=8>
Aug  6 09:58:26 isolar /netbsd:         address: 0x01421d6a0

Aug  6 09:58:27 isolar /netbsd: Async registers (mid 0): 
afsr=2000<SE,AFA=0>; afva=0x00
Aug  6 09:58:27 isolar /netbsd: cpu0: NMI: system interrupts: 
10090000<VME=0,SBUS=0,E,T,M>
Aug  6 09:58:27 isolar /netbsd: memory error:
Aug  6 09:58:27 isolar /netbsd:         EFSR: 5231<CE,DW=3,SYNDROME=52>
Aug  6 09:58:27 isolar /netbsd:         MBus transaction: 
8fe38d30<VAH=0,TYPE=3,SIZE=5,C,VA=8e,S,MID=8>
Aug  6 09:58:27 isolar /netbsd:         address: 0x01421d6a0

Looking back over the last week's worth of messages, the
flags on the NMI have varied, but the address is always the
same: 0x01421d6a0.

This is just a memory DIMM going bad, right?  And, if so, how do
I map it to the bad DIMM module?  (Update: I just saw an old post
to port-sparc from May 9th from Malte Dehling; he reported a similar
error, but his log also shows a "module location: " identifier?
Mine doesn't - is this a new reporting feature in NetBSD 2.0 or
something?)

I've got 3 64 MB DIMMs (in banks 0, 1 and 5) for a total of 192 MB,
so I could live without one of 'em temporarily ... what's weird is
that I did a "test-memory" from the boot PROM (with "selftest-#megs?"
set to all 192 MB) as well as booting in diag mode and having it test
memory there as well, and it didn't hiccup on that address ...

	- Greg