Subject: Re: 5000/240 netboot/tftp failure
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Michael K. Sanders <msanders@aros.net>
List: port-pmax
Date: 07/07/1997 21:49:36
Jonathan Stone writes:
>
> > ? PC:  0x80021fe4<vtr=NRML>
> > ? CR:  0x30000010<CE=3,EXC=AdEL>
> > ? SR:  0x30080000<CU1,CU0,CM,IPL=8>
> > ? VA:  0xa000ef3a
>
>Looks very nasty, 0x80020000 is in the PROM, not in the kernel.  Can
>you run a tcpdump on the Ethernet segment?  I'd guess the PROM is
>crashing before it even talks to the net.  I'd suspect bad memory, a
>fault accessing 0x0000ef3a as an uncached address in fact.

Yep, tcpdump doesn't show anything hitting the wire... that shows up
almost immediately after hitting <ENTER>.

>I'm too lazy to decode the CAUSE register and UTSL to figure out
>what's going wrong.  If memory serves, you're getting a fault in an
>insn fetch or data load.  Which is consistent with bad memory

If I knew how to decode them, I might offer... :) 

>The 5000/240 has ECC with the same controller and software fixup as
>the 5000/200.  I don't know if the PROM recovers from ECC errors,
>though.  What happens if you do

  >> test

performs various and sundry tests (not exactly speedy, are they? :)...

>Does it report a correctable memory fault?

...but doesn't seem to find any problems at all.

>If so, I'd suggest getting an protective-earth wrist strap and
>changing the memory boards, or at least rearranging them so the board
>in slot 0 is in the last occupied slot.

I've got two boards in it-- should I try swapping them anyway?

>I should start charging for remote diagonstic services or something :)

Heh. :)

Thanks for the suggestions.

:: Mike ::