Subject: Re: 5000/240 netboot/tftp failure
To: Michael K. Sanders <msanders@aros.net>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: port-pmax
Date: 07/07/1997 20:05:22
 > I sent this a couple days ago, but didn't get any responses (except
 > from Simon Burge clarifying DECsystem vs. DECstation, thanks!).
 > 
 > Can anyone tell me what this means or suggest something else to try?

 > >>boot 3/tftp
 >   
 > ???
 > ? PC:  0x80021fe4<vtr=NRML>
 > ? CR:  0x30000010<CE=3,EXC=AdEL>
 > ? SR:  0x30080000<CU1,CU0,CM,IPL=8>
 > ? VA:  0xa000ef3a
 > 
 > :: Mike ::
 > 
 > 


Looks very nasty, 0x80020000 is in the PROM, not in the kernel.  Can
you run a tcpdump on the Ethernet segment?  I'd guess the PROM is
crashing before it even talks to the net.  I'd suspect bad memory, a
fault accessing 0x0000ef3a as an uncached address in fact.

I'm too lazy to decode the CAUSE register and UTSL to figure out
what's going wrong.  If memory serves, you're getting a fault in an
insn fetch or data load.  Which is consistent with bad memory

The 5000/240 has ECC with the same controller and software fixup as
the 5000/200.  I don't know if the PROM recovers from ECC errors,
though.  What happens if you do

	>>  t 3

(Or maybe just 'test', I forget.)  
Does it report a correctable memory fault?

If so, I'd suggest getting an protective-earth wrist strap and
changing the memory boards, or at least rearranging them so the board
in slot 0 is in the last occupied slot.

I should start charging for remote diagonstic services or something :)