Subject: Re: Processor correctavke error?
To: Michael T. Stolarchuk <mts@rare.net>
From: Chris G. Demetriou <cgd@pa.dec.com>
List: port-alpha
Date: 06/11/1998 14:33:24
> ok, here's another machine check. This one comes from one of the EB164's
> i'm playing with, all of them exhibit the same machine check.

Are they actually EB164s?  (I doubt it.)  If they have the Pyxis
chipset, you probably want to be running NetBSD-(very-)current on them...
(If you're not, I could believe you're seeing the problem you
mention.)

>     unexpected machine check:
> 
> 	mces    = 0x1
> 	vector  = 0x670
> 	param   = 0xfffffc0000006068
> 	pc      = 0xfffffc0000495500
> 	ra      = 0xfffffc00004954e0
> 	curproc = 0xfffffc0000520058
> 	    pid = 0, comm = 
> 
>     panic: machine check
>     Stopped at      Debugger+0x4:   ret     zero,(ra)
>     ..
> 
> so mces says that the error is uncorrectable...

No, mces says the error is a machine check.

Machine checks may or may not be correctable, depending on their exact
cause...

In particular:

> pc:..499500 is in cia_swiz_mem_read_1...

One caused from the memory read functions is a fatal kernel bug
(i.e. it indicates a bug in either driver software or the
machine-dependent-bits implementation).  It probably _could_ be
corrected (really, ignored) by operating system software, but it would
not really be correct to do so.

The PALcode, on the other hand, has no idea what to do when presented
with the situation, so reports it as a machine check.


> i've got ddb to answer, but it doesn't give a trace... which register should i
> use as a base for the trace? 

well, ra and sp are good choices, but those will first take you
through the kernel trap handlers, etc.  The information you really
want is the stuff "above" the machine-check handler invocation, which
ddb's trace facility probably won't be able to give you easily.



cgd