Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Blade 2000 stability issue



On Wed, 31 Jul 2013, Martin Husemann wrote:

> On Wed, Jul 31, 2013 at 12:49:58PM +0200, BERTRAND Joël wrote:
> > cpu0 at mainbus0: SUNW,UltraSPARC-III+ @ 900 MHz, UPA id 0
> > cpu1 at mainbus0: SUNW,UltraSPARC-III+ @ 900 MHz, UPA id 1
> 
> So cpu1 was trying to send an IMPI to cpu0, which did not respond (probably
> because it had interrupts disabled). We have another report of similar
> nature, on quite different hardware, but it is pretty hard to debug.

In this case sir is your friend.  When the IPI times out have that CPU 
execute and SIR instruction.  I think it should cause all CPUs to jump to 
OBP and dump their registers.  

If SIR is not wired up on that system to reset all CPUs, have the 
operation spew a message and hang and then the user can pull the XIR line.  
That will definitely cause all CPUs to jump to OBP and dump state.

After that it's a question of finding out where the pc of the hung CPU 
happened to be at the time of the reset.

Eduardo


Home | Main Index | Thread Index | Old Index