Subject: Re: SparcStation 20 SMP trouble
To: None <port-sparc@netbsd.org>
From: Malte Dehling <mdehling@math.ruhr-uni-bochum.de>
List: port-sparc
Date: 05/09/2005 16:29:10
--0F1p//8PRICkK4MW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, May 09, 2005 at 03:35:02PM +0200, Bernd Sieker wrote:
> On 09.05.05, 13:40:33, Malte Dehling wrote:
> >=20
> > I dont have another SS10/20 right now, to test if its really the module=
s, but
> > it looks indeed as if something were broken. I will do some stress-test=
ing with
> > the `slightly broken' module tomorrow, just to see what happens.
> > Im still wondering why I get memory errors? Are they caused by a bad CP=
U?
>=20
> Unless they're explicitly from the SS20's ECC memory controller
> ("eccmemctl0 at mainbus0 ioaddr 0x0: version 0x0/0x1") they're
> likely to be caused by a defective Cache or Cache controller.

Last time I had a broken memory module, the errors looked like this (from t=
he
logs):=20

Jan 29 11:27:42 wks01 /netbsd: cpu0: NMI: system interrupts: 10000000
Jan 29 11:27:42 wks01 /netbsd: memory error:
Jan 29 11:27:42 wks01 /netbsd: 	EFSR: e621<CE,DW=3D2,SYNDROME=3De6>
Jan 29 11:27:42 wks01 /netbsd: 	MBus transaction: 8fffdd50<VAH=3D0,TYPE=3D5=
,SIZE=3D5,C,LOCK,VA=3Dff,S,MID=3D8>
Jan 29 11:27:42 wks01 /netbsd: 	address: 0x0af0a80
Jan 29 11:27:42 wks01 /netbsd: 	module location: J0201

(Correctable Error). I removed the module and I never got the error again.
This time I have:

memory error:
	EFSR: 10002<DW=3D0,SYNDROME=3D0,ME>
	MBus transaction: fc10d30<VAH=3D0,TYPE=3D3,SIZE=3D5,C,VA=3D4,S,MID=3D0>
	address: 0x0f028f000
	module location: ?

(unknown module location!) but before that I get this:

cpu0: NMI: system interrupts: 40080000<VME=3D0,SBUS=3D0,T,ME>
module0:
        mxcc error 0x0
        mxcc status 0xff1410002
        mxcc reset 0x0
module1:
        mxcc error 0xb304000001d6900
        mxcc status 0xff1402000
        mxcc reset 0x0
dump cpu0: NMI: system interrupts: 50080000<VME=3D0,SBUS=3D0,T,M,ME>

So I think its indeed something with the cache (of module1)...

> (iirc the MXCC also handles MBus communications).

What about modules without Cache? They dont have Cache Controllers. That is
were my next question comes from: Is it possible to disable cache in
NetBSD? It would still be a lot better then my previous configuration (I
had a single 50MHz module without cache.)

>=20
> If you're lucky the extended POST (diag-switch?=3Dtrue) catches them,
> but it might not.

Extended POST runs just fine, except for the `Data Access Error' after it
has finished. See:

http://dnsspam.student.utwente.nl/~mdehling/files/sys/ss20-boot1.log
http://dnsspam.student.utwente.nl/~mdehling/files/sys/ss20-boot2.log

>=20
> I've also just come acress a paragraph on "The Rough Guide to MBus Module=
s", on
>   http://mbus.sunhelp.org/modules/#super
>=20
> --- quote ---
>   WARNING:  it has recently come to light that some SM41, SM51 and
>   maybe SM71 modules appear to be specific to the Fujitsu Teamserver,
>   and do not work in other systems. Unfortunately there is no known
>   way to distinguish these Fujitsu-custom modules from "regular"
>   ones, as both have the same "501-" part-number stickers. If you
>   have any information on how to distinguish these modules, please
>   email spooferman@excite.com .
> --- end quote ---
>=20
> So if it turns out your modules are from a "Fujitsu Teamserver"
> you might be out of luck.
>=20

I will try to find out.

>=20
> > ---
> > According to http://mbus.sunhelp.org/modules/index.htm, this module has
> > MXCC 3.3.=20
>=20
> Ah yes, module-info. I somehow thought the PROM would print MXCC
> revision on normal bootup ...
>=20
> >=20
> > --=20
> > Malte Dehling
> >=20
> > Mail:		mdehling [at] math.ruhr-uni-bochum.de
> > Website:	http://mdehling.ath.cx/
> > PGP:		2586 A3BF B438 E68E 2B85  C4EA C5A7 AD96 C865 03D2
>=20
>=20
>=20
> --=20
> Bernd Sieker
>=20
> My other computer runs NetBSD
> 		-- Allen Briggs

--=20
Malte Dehling

Mail:		mdehling [at] math.ruhr-uni-bochum.de
Website:	http://mdehling.ath.cx/
PGP:		2586 A3BF B438 E68E 2B85  C4EA C5A7 AD96 C865 03D2

--0F1p//8PRICkK4MW
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (SunOS)

iD8DBQFCf3OoxaetlshlA9IRAnn7AJ4ifdSFZa6q5p3Z4m/bKu9Pc8x/HQCfenht
lovSx7E96pdJVf8ELm6rpkg=
=ew5L
-----END PGP SIGNATURE-----

--0F1p//8PRICkK4MW--