Subject: Re: Isolating memory error
To: None <port-sparc@NetBSD.ORG>
From: Greg Earle <earle@isolar.DynDNS.ORG>
List: port-sparc
Date: 09/09/2003 09:45:49
> If it doesn't happen again, I'd be inclined to ignore it.  If it
> happens again, especially at the same address, some RAM may need
> reseating or replacing.

I'm more worried that it's a problem with one of the 2 SIMMs I just put in.

>> Any way to isolate this to the affected SIMM?
> 
> 0x01e76b89c says which SIMM, though not as pellucidly as you might
> wish: it's the one containing that physical address.

No sh*t, Sherlock  :-)

> My best guess at working out which SIMM it is follows.  I am not at all
> sure I have this right; if someone knows better, please correct me.
> 
> On a 20, SIMMs appear 64MB (the max SIMM size) apart.  Their base
> addresses are multiples of 04000000.  The address you cite,
> 0x01e76b89c, is 7*0x04000000 + 0x276b89c.  Thus, counting SIMMs from 0,
> it is in SIMM number 7 (which is obviously a 64M SIMM, since it
> contains an address above the 32M mark).  I don't know whether the 20's
> SIMM socket numbers bear a simple relation to their physical locations,
> but if you turn on diag mode in the ROMs ("setenv diag-mode? true" or
> "setenv diag-switch? true" or some such - check "printenv") you'll get
> a dump of what memory is present near the end of the POST.  You can
> then find out which one it is by pulling SIMMs until number 7 is
> reported empty on power-up.  I have memories from adding memory to my
> own 20 which may be incorrect after this long (it was months ago) but,
> if they are correct, indicate that sockets 6 and 7 are the two nearest
> the SBus connectors, the two with the small extra VSIMM socket added on
> to the side.

Thanks for the tip on turning on diag-switch?.  It says:

Available Memory 0x1f000000
Allocating SRMMU Context Table
Context Table allocated, Available Memory 0x1efc0000
Setting SRMMU Context Register
Context Table allocated, Available Memory 0x1efc0000
Setting SRMMU Context Table Pointer Register
RAMsize allocated, Available Memory 0x1efb0000
Allocating SRMMU Level 1 Table
Level 1 Table allocated, Available Memory 0x1efafc00
Mapping RAM @ 0xffef0000
RAM mapped, Available Memory 0x1efafa00
Mapping ROM @ 0xffd00000
ROM mapped, Available Memory 0x1efaf800
Mapping ROM @ 0x00000000
ROM mapped, Available Memory 0x1efaf000
ttya initialized
Cpu #0 TI,TMS390Z55
Cpu #1 Nothing there
Cpu #2 TI,TMS390Z55
Cpu #3 Nothing there
Probing Memory Bank #0 64 Megabytes of DRAM
Probing Memory Bank #1 64 Megabytes of DRAM
Probing Memory Bank #2 Nothing there
Probing Memory Bank #3 Nothing there
Probing Memory Bank #4 Nothing there
Probing Memory Bank #5 64 Megabytes of DRAM
Probing Memory Bank #6 Nothing there
Probing Memory Bank #7 32 Megabytes of DRAM
Probing /iommu@f,e0000000/sbus@f,e0001000 at f,0  espdma esp sd st ledma le SUNW
,bpp
Probing /iommu@f,e0000000/sbus@f,e0001000 at e,0  SUNW,DBRIe
Probing /iommu@f,e0000000/sbus@f,e0001000 at 0,0  Nothing there
Probing /iommu@f,e0000000/sbus@f,e0001000 at 1,0  Nothing there
Probing /iommu@f,e0000000/sbus@f,e0001000 at 2,0  Nothing there
Probing /iommu@f,e0000000/sbus@f,e0001000 at 3,0  cgsix

SPARCstation 20 MP (2 X SuperSPARC-II), No Keyboard
ROM Rev. 2.22, 224 MB memory installed, Serial #7943709.

Do you guys still think it's in bank #7 now, given that #7 contains a 32
Mbyte SIMM and not a 64 Mbyte SIMM, as you previously had surmised?

I'm hoping it was just a cosmic ray hit - but if it happens again, I'd
like to know which SIMM to yank, in case that new memory was bad (the 32
Mbyte SIMM was one of the two "new" ones I added).

Thanks,

	- Greg