Subject: Re: NMI intterupts
To: None <mcr@gateway.sandelman.ocunix.on.ca>
From: Gordon W. Ross <gwr@mc.com>
List: port-sun3
Date: 02/03/1996 22:00:24
> Date: Sat, 03 Feb 1996 18:30:39 -0500
> From: Michael Richardson <mcr@gateway.sandelman.ocunix.on.ca>

>   Under a load (compiling mh, downloading tk today), my sun3 seems to
> die with an MNI interrupt received:
> 
> login: nmi interrupt received
> Stopped at      _Debugger+0x6:  unlk    a6

> db> trace
> _Debugger(e080214,ee5bfd8,e08031a,0,0) + 6
> _nmi_intr(0,0,2e87e,dffe9e0,dffe994) + 22
> _isr_autovec(7c) + 68
> __isr_autovec() + a
> db> 
> 
>   At this point, if I continue, then the machine freezes, requiring a
> cold boot. (power switch... BREAK on ttya doesn't cut it)
> 
>   Ideas? What is NMI hooked up to?

NMI is caused by a memory error or clock (NMI clock is disabled),
so it must be a memory error.  To find out precisely what caused it,
you need to look at the "memory error register" which is a location
in OBIO space.  The register is mapped at the address shown in
prom_mappings[4] and you want two words at that address, i.e.:

	db> x/xx prom_mappings
	_prom_mappings:      fe00000     fe02000
	db> 
	_prom_mappings+0x8:          fe04000     fe06000
	db> 
	_prom_mappings+0x10:         fe08000     fe0a000
	db> x/xx 0xfe08000
	0xfe08000:          50ffffff    6e07f888
	db> c

The bits in that first byte (0x50) are:

/*
 *  Bits for the memory error register when used as parity error
register
 */
#define PER_INTR    0x80    /* r/o - 1 = parity interrupt pending */
#define PER_INTENA  0x40    /* r/w - 1 = enable interrupt on parity error */
#define PER_TEST    0x20    /* r/w - 1 = write inverse parity */
#define PER_CHECK   0x10    /* r/w - 1 = enable parity checking */
#define PER_ERR24   0x08    /* r/o - 1 = parity error <24..31> */
#define PER_ERR16   0x04    /* r/o - 1 = parity error <16..23> */
#define PER_ERR08   0x02    /* r/o - 1 = parity error <8..15> */
#define PER_ERR00   0x01    /* r/o - 1 = parity error <0..7> */

/*
 *  Bits for the memory error register when used as ECC error register
 */
#define EER_INTR    0x80    /* r/o - ECC memory interrupt pending */
#define EER_INTENA  0x40    /* r/w - enable interrupts on errors */
#define EER_BUSHOLD 0x20    /* r/w - hold memory bus mastership */
#define EER_CE_ENA  0x10    /* r/w - enable CE recording */
#define EER_TIMEOUT 0x08    /* r/o - Sirius bus time out */
#define EER_WBACKERR    0x04    /* r/o - write back error */
#define EER_UE      0x02    /* r/o - UE, uncorrectable error  */
#define EER_CE      0x01    /* r/o - CE, correctable (single bit) error */


Someday, we need to make the nmi handler figure out what caused
the NMI and print out the memory error address, bit, etc.

Gordon