Subject: Re: Machine Check message on Alpha 164LX
To: Hyung Min SEO <HMSEO@sec.samsung.com>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: port-alpha
Date: 06/15/1999 11:40:43
On Tue, 15 Jun 1999 15:21:18 +0900 
 Hyung Min SEO <HMSEO@sec.samsung.com> wrote:

 > Hello everyone,
 > 
 > Has anyone seen that following error message on Alpha 164LX.
 > Our customer were facing following machine check message 
 > in the display while setup Net BSD on the LX.
 > They are using Net BSD 1.2G on 164LX as a Router application.

NetBSD 1.2G is _quite_ old.  They really ought to upgrade to 1.4,
which has fixed a significant number of VM problems, etc.

 > ------------------------------------------------------------------
 > Machine Check :
 > 
 > mces		= 0 X 1
 > vector		= 0 X 660
 > paran		= 0 X fffffc0000006068
 > pc		= 0 X fffffc000023036c
 > ra		= 0 X fffffc000037487c
 > curproc		= 0 X fffffe008aa75a00
 > pid		= 1022, comm = gated
 > 
 > panic : machine check
 >   stopped at debugger + 044: ret zero, (ra),
 > db>
 > ----------------------------------------------------------------
 > My question is as below.
 > 
 > 	- Which bad situation made "machine check" poped up on the display ?

It could be a number of things... see below.

 > 	- Where does "machine check" come from ? (OS, BIOS, others ?)

The PALcode sends the machine check interrupt to the kernel, which
panics.

 > 	- Is it possible to figure out specific memory address CPU
 > addressed last ?
 > 	- If  so, how could I find out ?

There is a "logout frame" (machine check information) set up by the PALcode.
It must be decoded in order to determine what happened.  This is both
CPU (ev4, ev5, etc.) and core-logic (ALCOR, Pyxis, APECS, etc) specific.
We only have code to decode this on a couple of the server systems in
NetBSD 1.4.

        -- Jason R. Thorpe <thorpej@nas.nasa.gov>