[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: SS20 - Upgrade to 9.2 crashes with memory errors
Riccardo Mottola wrote:
> Hi All,
> I had a perfectly working SS20 with HyperSparc modules running 9.0
> Bite the bullet.... let's upgrade! First I test the kernel (but not-MP)
> and it did boot!
> Thus I dumped miniroot on swap.. and proceed to upgrade! All went fine.
> On reboot... after a while I see on serial console repeated errors. I
> was able to grab a part below and paste it.
> Anybody else has issues? I cannot have a full dmesg rigth now and I
> don't know if Ic an boot a 9.0 series kernel.
> I suppose, given the previous "tests" about kernel, I fear the issue is
> about SMP!
I don't think this SMP.. although I was perhaps fooled what when booting
single-CPU kernel I did not see errors.
Took the computer out of the pile and tried with the SMP kernel
0) boot with CPU 1 & 2 and reproduce the eror (needs "a while" with
some work, not just login)
1) boot CPU 1 (lower slot) get the error after some compilation
2) boot CPU 2 (lower slot) get the error after some compilation
3) get out of storage old SuperSparc module (SM50?) with no cache, put
in lower slot and get strange issues like:
assertion "(*wnumtop) == 0" failed: file
439, function "bn_div_fixed_top
when using telnet or ssh, but locally tried to do some compilation and
reproduce the error! darn!
then looing better it says module J0202.
So I took J0202 out, put the one in J0304 into J0202 and retried. Seems
I tried shuffling RAM around and put the module in other places, it will
cause a variety of strange errors on startup.
From this I'd say there is no CPU issue, but most probably a RAM module
fried, in a position that selftests and boot did not immediately detect.
No SMP/Kernel issue either and the upgrade to 9.2 is a mere coincidence.
What do you think?
> [ 301.8077970] address: 0x0abb3d40
> [ 301.8077970] module location: J0202
> [ 301.8536050] Async registers (mid 8): afsr=0x0<AFA=0x0>; afva=0x00
> [ 301.8536050] cpu0: NMI: system interrupts:
> [ 301.8536050] SX STATUS: 00005400
> [ 301.8536050] SX ERROR : 00000000
> [ 301.8536050] SX DIAG : 00000000
> [ 301.8536050] memory error:
> [ 301.8536050] EFSR: 0x12611<CE,DW=0x1,SYNDROME=0x26,ME>
> [ 301.8536050] MBus transaction:
> [ 301.8536050] address: 0x0ab93d40
> [ 301.8536050] module location: J0202
> [ 301.8836230] Async registers (mid 8): afsr=0x0<AFA=0x0>; afva=0x00
> [ 301.8836230] cpu0: NMI: system interrupts:
> [ 301.8836230] SX STATUS: 00005400
> [ 301.8836230] SX ERROR : 00000000
> [ 301.8836230] SX DIAG : 00000000
> [ 301.8836230] memory error:
> [ 301.8836230] EFSR: 0x12611<CE,DW=0x1,SYNDROME=0x26,ME>
> [ 301.8836230] MBus transaction:
> [ 301.8836230] address: 0x0abb3d41
> [ 301.8836230] module location: J0202
> [ 301.9536150] Async registers (mid 8): afsr=0x0<AFA=0x0>; afva=0x00
> [ 301.9536150] cpu0: NMI: system interrupts: 0x10080000<VME=0x0,SBUS=0x0,T
Main Index |
Thread Index |