Subject: Re: Examining core dump..
To: Matthew Jacob <mjacob@ns.feral.com>
From: Chris G. Demetriou <cgd@pa.dec.com>
List: port-alpha
Date: 11/10/1997 09:53:15
> Machine checks are for exception conditions in alpha. Interrupts
> as well as memory errors.
> 
> In -current (and for a while) all of the know machine check
> conditions don't lead directly to a panic. There should have
> been a printf like:
> 
>   panic("unexpected interrupt: type 0x%lx vec 0x%lx a2 0x%lx\n", a0, a1, a2);
> 
> What were the contents. What was the release && h/w you're running anyway
> (I forget).


This is a really scary statement, considering that you've apparently
changed the interrupt delivery code substantially.


On the Alpha, exceptional conditions like device interrupts,
interprocessor interrupt requests, performance monitor interrupts, and
machine checks are expressed as "interrupts."  (There are other types
of exceptional conditions, e.g. memory management faults, instruction
faults, unaligned access faults, system calls, floating point
exceptions, etc., which are expressed differently.)

Machine checks (and their close cousins, correctable errors) indicate
a hardware problem or serious software bug.  Correctable errors
typically signal things like memory bit-flips (correctable by ECC).
Machine checks either indicate uncorrectable memory problems, "other
hardware problems," or software bugs (like the OS touching device or
memory space where there was no device or memory).


The default handler for machine checks and correctable errors (and,
looking through the code, nothing seems to install a custom handler)
causes the following behaviour:

	(1) On correctable errors, a warning message is printed.

	(2) On expected machine checks (during device probes),
	    a flag is set.

	(3) On unexpected machine checks, the system panics.


You seem to indicate that interrupts and memory errors are both types
of machine checks.  (That's what your first sentence says...)  This is
absolutely incorrect.

Machine checks and correctable errors are both types of interrupts.
(Actually, they're both the _same_ type, and you distinguish between
them by bits in the Machine Check Error Summary register.)  Memory
errors can cause either machine checks or correctable errors,
depending on whether or not they were, in fact, correctable.


On a slightly different note, you might note that the current version
of the alpha "interrupt.c" has copyright notices on it that
technically prohibit it from being used or distributed.  "You might
fix that."  (It's your copyright notice, Matt.)


chris