Subject: what's this machine check mean?
To: None <port-alpha@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-alpha
Date: 04/15/2000 02:21:27
How does one go about decoding machine checks to figure out what they
indicate a problem with?

I've just had a machine crash twice in fairly quick succession with

unexpected machine check:

    mces    = 0x1
    vector  = 0x670
    param   = 0xfffffc0000006048
    pc      = 0xfffffc000050f3e4
    ra      = 0xfffffc000050f3a8
    curproc = 0x0

panic: machine check
Stopped at      Debugger+0x4:   ret     zero,(ra)
db> 

(that from the second time; the first time the pc was ...50f3c8 and
there was a non-nil curproc).

I'd like to know how to tell what, if anything, this points to as
problematic: CPU? RAM? kernel? etc...

Kernel and userland are off the 1.4.2 CD; the first-14-blocks bootblock
is from a 1.4 machine (this because I built the boot disk on a SPARC,
and to get bootblocks on it I ran installboot on a file on a
NetBSD/alpha 1.4 machine, labeled the result via a vnd, and dded it to
the disk).

I don't (yet) know what routine that pc falls into; I'm currently
copying the kernel in question to the 1.4 alpha box, but it's over a
slow netlink, so it's taking a while.

The machine check occurred while I was typing a command (second time)
or copying and extracting the comp binary set (first time).  Here's
everything the console got since last power-up (carriage returns and
backspace overstrikes edited out).

*** keyboard not plugged in...
starting console on CPU 0
initialized idle PCB
initializing semaphores
initializing heap
initial heap 1c0c0
memory low limit = 100000
heap = 1c0c0, 17fc0
initializing driver structures
initializing idle process PID
XDELTA not enabled.
initializing file system
initializing 8259s
initializing timer data structures
lowering IPL
CPU 0 speed is 5.99 ns (167MHz)

Powerup process has started
CPU ID = 0
Initialize configuration locks etc.
Configure the memory
Initialising ISA/PCI interrupts.
Configure the PCI Bus
Start tt class, port (graphics, keyboard, then serial) drivers
entering idle loop
change stdin/out/err channels from nl to tt
Memory size = 16Mbytes
testing memory from 800000 to e16000 ...
Start driver phase 4
access NVRAM
Build this processor's slot in the hwrpb
Backup Cache size = 0Kb
Init SCSI class driver
Start driver phase 5
AXPpci33 Common Console X4.7-1860, built on Nov  1 1996 at 06:00:01
>>>boot dka0 -flags sn
(boot dka0.0.0.6.0 -flags sn)
block 0 of dka0.0.0.6.0 is a valid boot block
reading 14 blocks from dka0.0.0.6.0
bootstrap code read in
base = 110000, image_start = 0, image_bytes = 1c00
initializing HWRPB at 2000
initializing page table at 102000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code

NetBSD/Alpha 1.4 FFS Primary Bootstrap
Jumping to entry point...

NetBSD/alpha 1.4.2 Secondary Bootstrap, Revision 1.10
(toddpw@chewie.toddpw.net, Mar 3 18:37:40 PST 2000)

VMS PAL rev: 0x1000400010538
OSF PAL rev: 0x100090002012d
Switch to OSF PAL code succeeded.

Boot flags: sn

Loading netbsd...
3029528+279892 [85+205368+117098]

Entering netbsd at 0xfffffc0000301100...
[ preserving 323456 bytes of netbsd ELF symbol table ]
consinit: not using prom console
Copyright (c) 1996, 1997, 1998, 1999, 2000
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.4.2 (GENERIC) #1: Sat Mar  4 03:08:26 PST 2000
    toddpw@chewie.toddpw.net:/usr/src/sys/arch/alpha/compile/GENERIC
Alpha PC AXPpci33, 166MHz
8192 byte page size, 1 processor.
real mem = 16777216 (2072576 reserved for PROM, 14704640 used by NetBSD)
avail mem = 9125888
using 179 buffers containing 1466368 bytes of memory
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), LCA-2 (21066 pass 2)
lca0 at mainbus0
pci0 at lca0 bus 0
pci0: i/o enabled, memory enabled
ncr0 at pci0 dev 6 function 0: ncr 53c810 fast10 scsi
ncr0: interrupting at isa irq 11
ncr0: minsync=25, maxsync=206, maxoffs=8, 16 dwords burst, normal dma fifo
ncr0: single-ended, open drain IRQ driver
ncr0: restart (scsi reset).
scsibus0 at ncr0: 8 targets, 8 luns per target
sd0 at scsibus0 targ 0 lun 0: <SGI, SEAGATE ST51080N, 0950> SCSI2 0/direct fixed
sd0(ncr0:0:0): 10.0 MB/s (100 ns, offset 8)
sd0: 1010MB, 4826 cyl, 4 head, 107 sec, 512 bytes/sect x 2070235 sectors
sio0 at pci0 dev 7 function 0: Intel 82378ZB System I/O (SIO) (rev. 0x03)
ne0 at pci0 dev 12 function 0: RealTek 8029 Ethernet
ne0: 10base2, 10baseT, 10baseT-FDX, auto, default auto
ne0: Ethernet address 00:80:c8:df:2e:48
ne0: interrupting at isa irq 5
isa0 at sio0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
wdc0 at isa0 port 0x1f0-0x1f7 irq 14
lpt0 at isa0 port 0x3bc-0x3bf irq 7
pckbc0 at isa0 port 0x60-0x64
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
root device (default sd0a): 
dump device (default sd0b): 
file system (default generic): 
root on sd0a dumps on sd0b
root file system type: ffs
Nov 29 20:43:44 init: /etc/pwd.db: No such file or directory
Enter pathname of shell or RETURN for sh: 
# fsck_ffs -f /dev/rsd0a
** /dev/rsd0a
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
2187 files, 48226 used, 894637 free (917 frags, 111715 blocks, 0.1% fragmentation)

MARK FILE SYSTEM CLEAN? [yn] y


***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
# mount /dev/sd0a /
# ifconfig ne0 216.46
unexpected machine check:

    mces    = 0x1
    vector  = 0x670
    param   = 0xfffffc0000006048
    pc      = 0xfffffc000050f3e4
    ra      = 0xfffffc000050f3a8
    curproc = 0x0

panic: machine check
Stopped at      Debugger+0x4:   ret     zero,(ra)
db>