Subject: Halting cpu's at random or in long dumps????
To: None <port-alpha@netbsd.org>
From: NetBSD Bob <nbsdbob@weedcon1.cropsci.ncsu.edu>
List: port-alpha
Date: 08/07/2001 11:08:30
I have run into a problem of late on some of my Alphas (300 LX things).
I dunno if it is something related to current NetBSD (build as of a few
days ago) or something in the hardware going flaky.  It happens across
several machines, so I am hesitant to say it is hardware, yet.

The machines are running 1.5X, and have 64mb ram.  The drives are sd0-sd6
sd0-sd6 in storageworks shelves.

Symptom:  machine locks up on large dumps (e.g., dumping a 2 gig system
          disk into a ``mirror'' for live backup on a 1.8 gig fs).
          The mirror is the classical:
               dump 0f - /usr | ( cd /mnt; restore xf - )
          kind of thing.

I have not run into this on 1.5W and previously.

Pressing the almighty reset button (gad that sound gateswareish)
and the thing comes up running fine, but with the obvious dead
incomplete dump on the mirror fs.

Dumping in smaller pieces does fine.

Pressing the abort in dump exits fine, sometimes, and other times
nothing happens and a hard reset is required.   On the times I can
soft-abort, dump seems to be lost, and won't continue.

No error messages seem to be occurring anywhere that would suggest
hardware or software (scsiverbose msgs) problems.

It is almost as if the scsi drivers have gotten lost during the
dumping, and don't know where to go.

The drives check out fine.  The ram checks out fine.  The machine
seems to run fine otherwise.

Is there anything anyone can think of or suggest that might be causing
this?

Thanks

Bob