Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Memory/data errors



On Mon, 19 Mar 2018 16:15:56 +0000 (UTC)
Eduardo Horvath <eeh%NetBSD.org@localhost> wrote:

> On Sun, 18 Mar 2018, Sad Clouds wrote:
> 
> > Mar 15 17:53:42 ultra10 /netbsd: data error type 32 sfsr=0
> > sfva=425de020 afsr=400008 afva=17ff7b6fbf8 tf=0x1186c7ed0
> 
> Trap type 32 is a memory access error fault.  It happens when trying
> to access a device fails.  
> 
> Since the synchronous fault status register (SFSR) is zero, the
> contents of SFVA is probably irrelevent. 
> 
> The AFSR, however, does have interesting values, so the AFVA is
> probably valid.  Ths AFSR has the EDP bit set which indicates data
> parity in the E$ SRAM.  Bit 3 in the syndrome is set pointing to data
> bits 31-24.
> 
> The memory coherence domain on ultrasparc processors is the E$.
> Since neither the UE nor CE bits are set, this is probably not
> indicative of bad DRAM.  And the AFSR is not updated when the EDP bit
> is set.
> 
> The EDP bit indicates an ECC error on data read.
> 
> I'd say either your E$ SRAM is dying, or some operation is causing
> the contents of the E$ SRAM to get corrupted.
> 
> Eduardo

I think you were right about CPU cache dying, I've replaced the CPU
module (no idea why I had a spare one, must have pulled it out of a
broken Ultra 10 years ago) and no more errors. This machine has been
running for days now, building packages and NetBSD and not a single
issue so far.


Home | Main Index | Thread Index | Old Index