Subject: Re: SS20/MP Watchdog Reset
To: None <port-sparc@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: port-sparc
Date: 06/15/2004 21:24:29
On Tue, Jun 15, 2004 at 05:26:33PM +0000, Eduardo Horvath wrote:
> On Mon, Jun 14, 2004 at 11:54:46PM +0200, Juergen Hannken-Illjes wrote:
> > On this machine
> > 
> > 	total memory = 319 MB
> > 	cpu0 at mainbus0: mid 8: TMS390Z50 v0 or TMS390Z55 @ 85 MHz, on-chip FPU
> > 	cpu1 at mainbus0: mid 10: TMS390Z50 v0 or TMS390Z55 @ 85 MHz, on-chip FPU
> > 
> > running -current under heavy load I'm getting
> > 
> > 	Watchdog Reset
> > 	cpu0: NMI: system interrupts: 400c0000<VME=0,SBUS=0,SC,T,ME>
> > 	module0:
> > 		mxcc error 0x0
> > 		mxcc status 0xff1410002
> > 		mxcc reset 0x0
> > 	module1:
> > 		mxcc error 0x0
> > 		mxcc status 0xff1100000
> > 		mxcc reset 0x4 (WATCHDOG RESET)
> > 
> > The Watchdog Reset is always on module1. Software or hardware?
> 
> Watchdog resets are caused by taking a trap when traps are disabled.
> 
> This particular fault is a level 15 interrupt.  I think the only
> cause of level 15 interrupts are asynchronous memory errors.
> Since traps should only be disabled inside trap handlers, you
> are probably suffering from bad RAM.

... or a bad cpu. After replacing the upper cpu and reorganizing ram (it was
not contiguous) the machine runs for 9 hours without problems.
-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)