Subject: SparcStation 20 SMP trouble
To: None <port-sparc@NetBSD.Org>
From: Malte Dehling <mdehling@dnsspam.student.utwente.nl>
List: port-sparc
Date: 05/08/2005 22:06:27
I've been running NetBSD 1.6 and 2.0 on my SparcStation 20 with a 50MHz
SuperSparc without cache (501-2708-072152) for a long time and I never had any
problems. Last week I got 2 75MHz SuperSparc II CPUs with 1MB cache each
(501-2520) and as my prom rev is 2.22 (minimum required according to
http://mbus.sunhelp.org/systems/sun/ss20.htm) I expected them to work without
problems, which was not the case...

According to the person I got these from, the modules are tested and should be
ok. I took care when putting them in their place so I think its not something I
did, physically...

Anyway, when I turned on the SS/20 I got a `Power On Self Test', then I got a
`Data Access Exception'. After doing a reset at the ok prompt, the SS/20 boots
my NetBSD 2.0 kernel (no SMP). I downloaded the 2.0.2 GENERIC.MP kernel and
tried booting it (again, ignoring the data acces exception), which resulted in a
kernel panic, some `xcall(cpu1,some hex number): couldn't ping cpus: cpu0'
errors and some memory errors... I forgot to switch on logging in minicom for
this part, but after trying to boot a 2.0.0 GENERIC.MP kernel (which failed in
the same way), I set diag-switch? to `true', hoping to get some more
information, and enabled logging. The logs can be seen here:

http://dnsspam.student.utwente.nl/~mdehling/files/sys/ss20-boot1.log
http://dnsspam.student.utwente.nl/~mdehling/files/sys/ss20-boot2.log

(There are some `Data Acces Exception', `Instruction Access Exception' and
`Level 15' errors. Kernels: netbsd.old = GENERIC 2.0, netbsd = netbsd.202 =
GENERIC.MP 2.0.2, netbsd.200 = GENERIC.MP 2.0)

Some more hints...
- <ok> 2 switch-cpu sometimes hangs the SS/20.
- I tried using the CPU that is now #2 in the lower slot as #0, POST hang for
about 10 min, then I reset the box. I may retry this with diag-switch? set
to `true' later.
- I get the `Data Acces Exception' even when only the #0 CPU is in its slot. So
its not just broken CPU.
- The ram modules should be ok, the box never crashed, even with extremely high
loads, compiling like 5 or 6 packages from pkgsrc at the same time...

Could it be something with the mainboard rev or so? IIRC some Ultra-2 s for
example can only have UltraSparc I CPUs...

Any comments/suggestions are welcome.

PS: I will do some more testing and maybe buy some other ram, etc from tuesday
on. I cant do too much testing until then, as I have the last of my final exams
on tuesday morning... Wish me luck ;-)

---
Malte Dehling

Mail:           mdehling [at] math.ruhr-uni-bochum.de
Website:        http://mdehling.ath.cx/
PGP:            2586 A3BF B438 E68E 2B85  C4EA C5A7 AD96 C865 03D2