Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UltraSPARC III... Stability issue ?



On Wed, 19 Mar 2014, BERTRAND Joël wrote:

> Same reboot this morning :
> 
> trap type 0x68: cpu 1, pc=410657c0 npc=410657ac
> pstate=0xffffffff99820092<,PEF,IE>
> Skipping crash dump on recursive panic
> panic: +fast data access MMU miss
> cpu1: Begin traceback...
> cpu1: End traceback...
> cpu0: shutting down
> cpu1: rebooting

Strange.  It claims to have taken a trap type 68 which is an MMU miss, but 
those messages come from trap() and the MMU miss code doesn't call that 
routine.  

A trap 68 should execute either the code at ufast_DMMU_miss if running in 
user mode, or kfast_DMMU_miss if running kernel mode.  Both do the same 
thing: look for a matching entry in the TSB, or punt to data_miss if it 
can't be found.

The code at data_miss will walk the page tables looking for a matching 
entry.  If the entry is not found it will usually generate an NFO entry 
and return.  There's some complicated code in case of ditry register 
windows, but that mostly calls data_access_fault() not trap(), and 
data_access_fault() prints "data_access_fault" not "trap type" which comes 
from trap()

The only path I see from the miss handler to trap() is locore.s:1752 which 
is a rather nasty corner case where the CPU took a MMU miss fault while 
trying to save state to handle a data fault, which should never happen.  
The problem is, the jump to slowtrap happens shortly after a software 
initiated reset instruction, which should reset the machine and bomb out 
to the firmware.

So I find this crash really strange.  Maybe enabling DDB and dumping some 
registers will shed some light on this panic.

Eduardo


Home | Main Index | Thread Index | Old Index