Subject: Re: Multia problems
To: Erik Rungi <rungus@openface.ca>
From: Chris G. Demetriou <cgd@cs.cmu.edu>
List: port-alpha
Date: 03/25/1997 13:19:00
> I have an Alpha Multia 166MHz that I've managed to get the 1.2 port
> running on, and things were working great for about 20 days or so (20
> days uptime no less).
> 
> Then, all of a sudden, the machine decided that it could stay alive
> for no more than 5 minutes, ever.

So, I had the same problem the other day, but it was after the machine
had been up for several months.


> It crashes randomly in the following ways:
> 
> 1. locks up solid
> 2. cold resets itself with no warning (straight back to the boot prom)
> 3. In single user mode sometimes it freezes but the cursor will still
>    move around
> 4. Sometimes it panics (I see panic: .. for 0.003 seconds, then it resets)
> 5. Sometimes it panics properly and I get messages like:
> 
> panic: machine check: vec 0x670, pc=0x205C0, ra=0xfffffc000024dae0
> syncing disks... done
> dumping to dev 801, offset 111664
> dump device bad
> 
> rebooting...

I've seen all of those, plus:

(1) endless machine-checks and/or processor correctable errors on
startup,

(2) "panic: user requested console halt", which isn't even supposed to
be _possible_ on the multia,

(3) apparent panics where it'll crash back to the console, and the
console will respond to typing, but with the typing showing up garbled
on the screen because the console software doesn't seem to have been
properly initialized.


> The only thing in common with these events is they usually occur 2-10
> minutes after boot up, usually faster if I try to do more "things"
> (like start X, copy files, or otherwise type things in on the
> keyboard), and if I'm doing "things" when it dies, it tends to do #1
> most often, while if i'm not doing things, #5 is more likely to occur
> (but #5 is still unusual, its usually #1 or #2 in most cases).
> 
> Even if I do absolutely nothing, just leave it in single user mode
> it still freezes up after 2-10 minutes (this is a rough time estimate,
> I wasn't watching the clock or anything).

Yup.


> I'm using the generic kernel that comes with the distribution of 1.2.

I was, too, and then when i started happening (spontaneously; no
changes in system software or hardware!) I tried newer kernels, etc.
"No dice."


> My guess is that some piece of hardware has flaked out and the thing
> is now a doorstop, but I'm hoping otherwise.  If anybody has any ideas
> here, I'd greatly appreciate your help!

My guess is hardware lossage, as well.  I've boxed up the multia I was
using, and am in the process of replacing it with one of my older
486en.  I don't care about the speed difference (even if there would
be a speed difference; not entirely sure 8-), but I need the machine
to be up, and stable, and be a PPP gateway...


BTW, to those of you out there considering buying multias: I've now
upgraded my "don't buy unless you want a cheap, but very slow, alpha
at home," to simply "don't buy."  8-S


cgd