Subject: Re: port-alpha/5546: port-alpha/lost a stack? exception_restore_regs bombs
To: None <mjacob@nas.nas.aogv>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: port-alpha
Date: 06/05/1998 12:23:51
On Fri, 5 Jun 1998 12:12:25 -0700 (PDT) 
 mjacob@nas.nasa.gov wrote:

 > >Description:
 > Running on a 128MB Alpha 8200, running a moderate disk exerciser,
 > the system panic'ed (here's the extended printout, with the
 > PAL logout area registers, perhaps Ross who knows PAL code
 > better, could tell us in more detal what cooks-
 > this printout code isn't checked in yet):

If the stack went away, you would get a "Kernel Stack Not Valid Halt"; no
machine check would occur.  If the pointer was invalid, you'd get a
memory management fault.

What I suspect is happenening is that you're getting a memory error.  In
this case, printing out the EV5 portion of the logout area is not sufficient.
You also need to print out the platform-specific (i.e. KN8AE) logout area,
which will have the ECC memory error information, etc.

(At least, it looks like you only have the EV5 portion... maybe I'm mistaken.)

 > 
 >       Processor Machine Check (670), Code 0x100000096
 > 	PAL temp[0-1]		= 0x0000000000000000 0x0000006164000000
 > 	PAL temp[2-3]		= 0xfffffc00003004d4 0x0000000000008680
 > 	PAL temp[4-5]		= 0xfffffe00003a375c 0x0000000000000006
 > 	PAL temp[6-7]		= 0x0000000000000001 0xfffffc00003003e8
 > 	PAL temp[8-9]		= 0x1f1e161514020100 0xfffffc0000300474
 > 	PAL temp[10-11]		= 0xfffffc0000300354 0xfffffc0000300418
 > 	PAL temp[12-13]		= 0xfffffc00003003b8 0x0000005555400000
 > 	PAL temp[14-15]		= 0x0000000000000000 0x00000000040385d9
 > 	PAL temp[16-17]		= 0x0000009806700801 0x0000000000000000
 > 	PAL temp[18-19]		= 0x00000001fffff418 0xfffffe00073c59d8
 > 	PAL temp[20-21]		= 0x0000000006778000 0xfffffc0000300444
 > 	PAL temp[22-23]		= 0xfffffc000053c1d0 0x000000000673a000
 > 	shadow[0-1]			= 0x0000000000000000 0x0000000000000000
 > 	shadow[2-3]			= 0x0000000000000000 0x0000000000000000
 > 	shadow[4-5]			= 0x0000000000000000 0x0000000000000000
 > 	shadow[6-7]			= 0x0000000000000000 0x0000000000000000
 > 
 >         Excepting Instruction Addr     = 0xfffffc0000300354
 >         Summary of arithmetic traps    = 0x0000000000000000
 >         Exception mask                 = 0x0000000000000000
 >         Base address for PALcode       = 0x0000000000018000
 >         Interrupt Status Reg           = 0x0000000000000000
 >         Current setup of EV5 IBOX      = 0x0000006164000000
 >         I-CACHE Reg Data parity error  = 0x0000000000000800
 >         D-CACHE error Reg              = 0x0000000000000000
 >         Effective VA                   = 0xfffffe00003a3658
 >         Reason for D-stream            = 0x0000000000014350
 >         EV5 SCache address             = 0xffffff000001d28f
 >         EV5 SCache TAG/Data parity     = 0x0000000000000000
 >         EV5 BC_TAG_ADDR                = 0xffffff80010d6fff
 >         EV5 EI_ADDR Phys addr of Xfer  = 0xffffff000011d6df
 >         Fill Syndrome                  = 0x0000000000009000
 >         ei_stat reg                    = 0xfffffff004ffffff
 >         ld_lock                        = 0xffffff0004b363df
 > 
 > unexpected machine check:
 > 
 >     mces    = 0x1
 >     vector  = 0x670
 >     param   = 0xfffffc0000008b10
 >     pc      = 0xfffffc0000300354
 >     ra      = 0xfffffc00003002e0
 >     curproc = 0xfffffe00003a3600
 >         pid = 342, comm = diskex
 > 
 > panic: machine check
 > syncing disks... 1 1 1 done
 > 
 > The PC decodes as:
 > 
 > (gdb) x/i 0xfffffc0000300354
 > 0xfffffc0000300354 <exception_restore_regs>:    ldq     v0,0(sp)
 > 
 > 
 > I'll retain the kernel and core dump if anyone wants to look at it.
 > >How-To-Repeat:
 > 	
 > >Fix:
 > 	
 > >Audit-Trail:
 > >Unformatted:

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-5                                       Work: +1 650 604 0935
Moffett Field, CA 94035                             Pager: +1 650 428 6939