Subject: Re: Decoding machine checks...
To: None <Riccardo.Veraldi@fi.infn.it>
From: None <kpneal@pobox.com>
List: port-alpha
Date: 09/13/2003 19:55:04
On Sun, Sep 14, 2003 at 01:40:23AM +0200, Riccardo.Veraldi@fi.infn.it wrote:
> 
> it could be your ecache
> I had very similar errors like yours on alphastation 500/333
> and te problem was a bad ecache module.
> they coudl nto be changed (motherboard embedded) so I just had to throw my
> alpha away but my error code was different from 660, so maybe it;s not the
> case of your problem, hope so for you.

Geeze, I hope it isn't that. This cache is soldered onto the board as
well. I checked, because I was wondering if I could bump it up to
4MB of cache. No such luck.

I think I did notice jumpers to disable the Bcache on the motherboard.
Disabling the cache would suck, but it would suck less than discarding
the board.

I did notice that the documented maximum amount of memory for this
board is a half gig. I've got a whole gig. I yanked out the top two
DIMMs and I'm going to see how that goes. The trivial test shown
below didn't cause a crash. If all goes well I'll swap the two sticks
that are in with the two that are out. I need to know if I have a
bad stick of memory or not.

> On Sat, 13 Sep 2003 kpneal@pobox.com wrote:
> 
> > On Sat, Sep 13, 2003 at 01:30:35AM -0400, kpneal@pobox.com wrote:
> > > I'll see if I can get the initial panic message next time it
> > > happens.
> >
> > Well, that was easy. I have an mfs /tmp.
> >
> > % dd if=/dev/zero of=/tmp/blarg
> >
> > Warning: received processor correctable error.
> > Warning: received processor correctable error.
> > Warning: received processor correctable error.
> > Warning: received processor correctable error.
> >
> > unexpected machine check:
> >
> >     mces    = 0x1
> >     vector  = 0x670
> >     param   = 0xfffffc0000006068
> >     pc      = 0xfffffc000051ca74
> >     ra      = 0xfffffc0000300ac8
> >     code    = 0x98
> >     curproc = 0xfffffc00051245c8
> >         pid = 119, comm = mount_mfs
> >
> > I'll try rearranging memory tomorrow. Oh, and the 'reboot' command
> > in ddb gave me another "correctable" error followed by a hang.
> >
> > *sigh*
> >
> > This is what happens when a household member decides to show who is in
> > charge by opening the back door to let the cats get fresh air. Having
> > outside air come in in the middle of the summer thereby bringing the
> > temperature in the computer room into the mid-90's (humid) just can't be
> > good for machines. I can imagine the temperature in the box being 10+
> > degrees higher than the room air temp, placing it dangerously close to
> > the outside of the operational bounds.
> >
> > Said household member is now removed and isn't coming back. Now I
> > just have to clean up the damage.
> >
> > I'm going to bed.
> >
> > Thanks for the help. Let's all cross our fingers for bad memory
> > and not a bad board.
> > --
> > Kevin P. Neal                                http://www.pobox.com/~kpn/
> >
> > "What is mathematics? The age-old answer is, of course, that mathematics
> >  is what mathematicians do." - Donald Knuth
> >
> >
-- 
Kevin P. Neal                                http://www.pobox.com/~kpn/
"Oh, I've heard that paradox a couple of times, but there's something
about a cat dying and I hate to think of such things."
  - Dr. Donald Knuth speaking of Schrodinger's cat, December 8, 1999, MIT