Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SS20 - Upgrade to 9.2 crashes with memory errors



Hi,

Riccardo Mottola wrote:
> Hi All,
>
> I had a perfectly working SS20 with HyperSparc modules running 9.0
> Bite the bullet.... let's upgrade! First I test the kernel (but not-MP)
> and it did boot!
>
> Thus I dumped miniroot on swap.. and proceed to upgrade! All went fine.
>
> On reboot... after a while I see on serial console repeated errors. I
> was able to grab a part below and paste it.
>
> Anybody else has issues? I cannot have a full dmesg rigth now and I
> don't know if Ic an boot a 9.0 series kernel.
>
> I suppose, given the previous "tests" about kernel, I fear the issue is
> about SMP!

I don't think this SMP.. although I was perhaps fooled what when booting
single-CPU kernel I did not see errors.
Took the computer out of the pile and tried with the SMP kernel
0) boot with CPU 1 & 2  and reproduce the eror (needs "a while" with
some work, not just login)
1) boot CPU 1 (lower slot) get the error after some compilation
2) boot CPU 2 (lower slot) get the error after some compilation
3) get out of storage old SuperSparc module (SM50?) with no cache, put
in lower slot and get strange issues like:

assertion "(*wnumtop) == 0" failed: file
"/usr/src/crypto/external/bsd/openssl/dist/crypto/bn/bn_div.c", line
439, function "bn_div_fixed_top

when using telnet or ssh, but locally tried to do some compilation and
reproduce the error! darn!

then looing better it says module J0202.
So I took J0202 out, put the one in J0304 into J0202 and retried. Seems
stable.

I tried shuffling RAM around and put the module in other places, it will
cause a variety of strange errors on startup.

From this I'd say there is no CPU issue, but most probably a RAM module
fried, in a position that selftests and boot did not immediately detect.
No SMP/Kernel issue either and the upgrade to 9.2 is a mere coincidence.

What do you think?

Riccardo

>
>
> Riccardo
>
>
> [ 301.8077970]  address: 0x0abb3d40
> [ 301.8077970]  module location: J0202
> [ 301.8536050] Async registers (mid 8): afsr=0x0<AFA=0x0>; afva=0x00
> [ 301.8536050] cpu0: NMI: system interrupts:
> 0x10080000<VME=0x0,SBUS=0x0,T,M>
> [ 301.8536050] SX STATUS: 00005400
> [ 301.8536050] SX ERROR : 00000000
> [ 301.8536050] SX DIAG  : 00000000
> [ 301.8536050] memory error:
> [ 301.8536050]  EFSR: 0x12611<CE,DW=0x1,SYNDROME=0x26,ME>
> [ 301.8536050]  MBus transaction:
> 0x8ffd4d50<VAH=0x0,TYPE=0x5,SIZE=0x5,C,VA=0xf5,S,MID=0x8>
> [ 301.8536050]  address: 0x0ab93d40
> [ 301.8536050]  module location: J0202
> [ 301.8836230] Async registers (mid 8): afsr=0x0<AFA=0x0>; afva=0x00
> [ 301.8836230] cpu0: NMI: system interrupts:
> 0x10080000<VME=0x0,SBUS=0x0,T,M>
> [ 301.8836230] SX STATUS: 00005400
> [ 301.8836230] SX ERROR : 00000000
> [ 301.8836230] SX DIAG  : 00000000
> [ 301.8836230] memory error:
> [ 301.8836230]  EFSR: 0x12611<CE,DW=0x1,SYNDROME=0x26,ME>
> [ 301.8836230]  MBus transaction:
> 0x8ff74d30<VAH=0x0,TYPE=0x3,SIZE=0x5,C,VA=0xdd,S,MID=0x8>
> [ 301.8836230]  address: 0x0abb3d41
> [ 301.8836230]  module location: J0202
> [ 301.9536150] Async registers (mid 8): afsr=0x0<AFA=0x0>; afva=0x00
> [ 301.9536150] cpu0: NMI: system interrupts: 0x10080000<VME=0x0,SBUS=0x0,T
>
>



Home | Main Index | Thread Index | Old Index