Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SIR Reset Watchdog Reset



On Fri, 12 Mar 2010, Jochen Kunz wrote:

> I have:
> NetBSD 5.0_STABLE (GENERIC.MP) #0: Fri Nov 27 10:40:10 CET 2009
>         
> jkunz@MissSophie:/datengrab/src/NetBSD/release-5/objdir/sparc64/sys/arch
> /sparc64/compile/GENERIC.MP
> total memory = 1024 MB
> avail memory = 992 MB
> timecounter: Timecounters tick every 10.000 msec
> mainbus0 (root): SUNW,Ultra-60 (Netra t 1120/1125): hostid 12345678
> cpu0 at mainbus0: SUNW,UltraSPARC-II @ 400.004 MHz, UPA id 0
> cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 4096K external (64 b/l)
> cpu1 at mainbus0: SUNW,UltraSPARC-II @ 400.004 MHz, UPA id 2
> cpu1: 32K instruction (32 b/l), 16K data (32 b/l), 4096K external (64 b/l)
> 
> This machine crashes about once a week with:
> NetBSD/sparc64 (Maja) (console)
> 
> login:
> SIR Reset
> 
> Watchdog Reset
> Externally Initiated Reset
> {2} ok 
> 
> The machine serves as a NFS and NIS server with nearly no load.
> / and /usr are on a pair of RAIDframe mirrored disks, /home is an
> external hardware RAID5. Filesystems are mounted with option "log".
> 
> Software bug or broken hardware?

Dunno.

SIR is a software initiated reset.  locore.s has them sprinkled around in 
some places where the kernel gets so stuffed up it can't recover.  Once 
you get one of those you should do:

ok .trap-registers
ok .registers
ok ctrace
ok 0 .window
ok 1 .window
ok 2 .window

until you get to

ok 7 .window

The most important information is .trap-registers and .registers.  You 
need to correlate the tpc and tnpc addresses with specific locations in 
the specific kernel you're running to determine what sequence of traps got 
you in the position to execute the sir instruction.  If you can figure out 
which specific sir instruction you hit that often gives you enough 
intformaton to figure out why the kernel took a dive.

Eduardo



Home | Main Index | Thread Index | Old Index