Subject: Re: unexpected machine checks
To: None <nmanisca@vt.edu, port-alpha@netbsd.org>
From: Ross Harvey <ross@ghs.com>
List: port-alpha
Date: 01/12/2001 15:58:52
> From: Nick <nmanisca@vt.edu>
>
> My Alpha had an Unexpected Machine Check for the first time
> today... I was burning a CD with cdrecord and I had just ssh'd
> into the machine when the panic occured.  On the console I
> found the following (approx):
>
> Unexpected Machine Check
> 	mces	= 0x1
> 	vector	= 0x660
> 	param	= 0xfffffc0000006000
> 	pc	= 0xfffffc0000302c20
> 	ra	= 0xfffffc0000302bfc
> 	curproc	= 0x0
>
> BTW this is an AlphaStation 200 4/233 running NetBSD 1.5
>
> So I rebooted and continued using the machine for a while,
> then while burning another CD and using amp (to convert mp3's
> to wav's) it suffered another machine check (approx):
>
> Unexpected Machine Check
> 	mces	= 0x1
> 	vector	= 0x660
> 	param	= 0xfffffc0000006000
> 	pc	= 0xfffffc0000302c20
> 	ra	= 0xfffffc0007372f10
> 	curproc	= 0x0
> 		pid = 362, comm = amp
>
> Is this indicative of bad memory?  How would I go about
> finding out what these machine checks mean?
>
> Maybe my reliability problems burning CDs and these
> machine checks are related?  I think I had better stop
> trying to burn CDs until I get this sorted out... I've
> destroyed 5 today.


The coaster-effect is one reason I went with CD-RW ... you can work out
the bugs that way and then burn a CD-R for giveaway when it's perfected.
(Someday let me tell you about how this interacts with Murphy's law... :-)

That system trap does mean an ECC or parity-type error, but it isn't always
caused by bad RAM, it can and does happen at times with bogus device access
caused by things like SW or chipset bugs.  Step one is to see what routine
is at 0xfffffc0000302c20.

However, you have a second problem. You really shouldn't even look cross-
eyed at your box when burning a CD. Don't even touch the keyboard or do
any network ops, never mind avoiding the encode or decode of mp3's. In
this, case, it's probably good (for us, anyway) that you did, as that error
should not happen no matter what. Do you have any history of single-bit
errors? Vector 620 or 630?

	Ross