Subject: Re: NMI / hanging
To: None <gendalia@iastate.edu>
From: Mike Long <mike.long@analog.com>
List: port-i386
Date: 11/24/1995 18:05:56
>From: gendalia@iastate.edu
>Date: Thu, 23 Nov 1995 12:18:46 CST

>I have had my computer hang 4 times between 3:30pm last night and
>8am this morning.  I was giving it more of a workout than I had
>in a while, so I don't believe it's related to what I'd just sup'd,
>more likely to be sick hardware.  (I was trying to compile /usr/src/lib
>each time it hung.)  The first three times I was running X in the
>only vt I was logged into, so I ended up just hitting reset since
>there wasn't anything else possible.  The fourth time I'd given up
>running X, and was logged in as root in a couple vts.
>
>In my first root vt it gave me:
>   NMI ... going to debugger
>   kernel: type 265 trap, code 0
>   stopped at 0x1006e029:  repe movsl  (%esi),%es:(%edi)
>   db>
>and I should probably just take debugging out of my kernel,
>since the only useful thing I know to do with it is press c
>and see what happens next.

The most likely source is a RAM parity error.  The fact that the NMI
occurred during a memory move (repe movsl) supports this
interpretation.  Try reseating your SIMMs and/or blowing out any dust
that's around them and see if that makes the problem go away.
Of course, this interpretation flies right out the window if your
system doesn't have parity-checked (9-bit) memory.  Most newer PCs
don't, which is a big mistake IMNSHO.

Even if you don't know how to use DDB it's a useful thing to have
around.  Including messages from DDB in problem reports (as you have
done) may help others diagnose your problem.  A stack trace (DDB's
"bt" command) is also usually useful.
-- 
Mike Long <mike.long@analog.com>           http://www.shore.net/~mikel
VLSI Design Engineer         finger mikel@shore.net for PGP public key
Analog Devices, CPD Division          CCBF225E7D3F7ECB2C8F7ABB15D9BE7B
Norwood, MA 02062 USA                assert(*this!=opinionof(Analog));