[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
re: UltraSPARC III... Stability issue ?
> data error type 32 sfsr=808004 sfva=41818400 afsr=10100000000000
> afva=13900000060 tf=0x2607efed0
> data fault: pc=1010234 addr=41818400 sfsr=0x808004<ASI=0x80,OW>
Dump of assembler code for function data_miss:
[ ... ]
0x0000000001010224 <+132>: ldxa [ %g5 ] #ASI_PHYS_USE_EC, %g4
0x0000000001010228 <+136>: sll %g6, 3, %g6
0x000000000101022c <+140>: brz,pn %g4, 0x1010278 <data_nfo>
0x0000000001010230 <+144>: add %g6, %g4, %g6
0x0000000001010234 <+148>: ldxa [ %g6 ] #ASI_PHYS_USE_EC, %g4
i'll need to consult the manual(s) and see why this is faulting.
up to this point, crashing has occured normally.
> kernel trap 32: data access error
but here, we're getting another fault trying to enter DDB.
> cpu1: data fault: pc=16068c8 rpc=10f5914 addr=ffffffffffff8000
Dump of assembler code for function memcpy:
[ ... ]
0x00000000016068bc <+348>: brz %l4, 0x1606a40 <memcpy+736>
0x00000000016068c0 <+352>: sllx %l4, 3, %l4
0x00000000016068c4 <+356>: mov 0x40, %l3
0x00000000016068c8 <+360>: ldx [ %l0 ], %o0
Dump of assembler code for function fill_ddb_regs_from_tf:
[ ... ]
213 DDB_REGS->db_fr = *(struct frame64 *)(uintptr_t)tf->tf_out;
0x00000000010f5904 <+68>: ldx [ %i5 + 0x1b0 ], %g1
0x00000000010f5908 <+72>: mov 0xb0, %o2
0x00000000010f590c <+76>: ldx [ %i0 + 0xa0 ], %o1
0x00000000010f5910 <+80>: ldx [ %g1 + 0x3e0 ], %o0
0x00000000010f5914 <+84>: call 0x1606760 <memcpy>
0x00000000010f5918 <+88>: add %o0, 0x130, %o0
so this looks like we fault trying to read the faulting lwp's
registers to save them for DDB to access. oops!
> kernel trap 30: data access exception
> Skipping crash dump on recursive panic
> panic: cpu0: ipi_send: couldn't send ipi to UPAID 1 (tried 10000 times)
this most likely happens because cpu1 is busy writing to the slow
console (serial or fb, either takes a Long time relatively.) we
might be able to extend the limit from 10000 to more to avoid that,
or also have ipi_send notice when a remote CPU is panicking.
> cpu0: Begin traceback...
> cpu0: End traceback...
i suspect this is because we can't copy the faulting registers..
this is all likely related to the basic failure that triggers
the original fault in data_miss.
eeh, we've basically not touched data_miss etc since your original
code... any ideas what would be causing this?
Main Index |
Thread Index |