Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Memory/data errors



On Sun, 18 Mar 2018 21:12:07 +0000
Sad Clouds <cryintothebluesky%gmail.com@localhost> wrote:

> On Mon, 19 Mar 2018 06:22:25 +1100
> matthew green <mrg%eterna.com.au@localhost> wrote:
> 
> > Sad Clouds writes:
> > > Hello, I've been seeing various errors and kernel hangs with
> > > NetBSD-8 on Sun Ultra 10. This is a rather old machine, so I'm
> > > assuming hardware is starting to fail, etc. I did run max
> > > diagnostics at openboot and it didn't find any issues.
> > > 
> > > Normally, I would run NetBSD build.sh and sooner or later GCC
> > > would segfault, or kernel would hang. I was also seeing the
> > > following errors logged:
> > > 
> > > Mar 15 17:53:42 ultra10 /netbsd: data error type 32 sfsr=0
> > > sfva=425de020 afsr=400008 afva=17ff7b6fbf8 tf=0x1186c7ed0
> > > 
> > > So I upgraded to latest snapshot and changed from GENERIC to
> > > GENERIC.UP and it seems much more stable now. Not seen any kernel
> > > hangs for a few hours, but I'm still running build.sh and it's too
> > > early to celebrate.
> > > 
> > > A few questions though, has anyone noticed anything similar when
> > > running GENERIC? Could these be some race conditions which are not
> > > present in uniprocessor kernel?
> > 
> > my ultra10s got this disease.  i still run one of them and it
> > occasinally hangs when idle or busy.  one of the problems with
> > the ultra10 is that when we try to 'sir' to recover from some
> > types of fatal error it hangs instead of resets.  i never got
> > around to seeing if we do something to cause it.
> > 
> > i haven't seen any issues i'd relate to GENERIC vs UP, but this
> > is an interesting point.  please let us know if this stays up.
> > 
> > 
> > .mrg.
> 
> Well, so far with the latest GENERIC.UP my Ultra 10 has been running
> build.sh all day today without a single issue.
> 
> When I was running GENERIC from October 2017, then within an hour,
> process would crash or kernel would hang. I suspected hardware issues,
> as I had quite a few bulging capacitors on the mainboard with
> dielectric leaking out. So I got a soldering iron and replaced all
> capacitors, also replaced power supply, just in case. This didn't
> seem to help, I was still getting kernel hangs, until I booted the
> latest GENERIC.UP kernel.
> 
> I'll be doing some more stress testing, but looks like GENERIC kernel
> might have been the culprit.

No, I spoke too soon, 

ultra10# data error type 32 sfsr=0 sfva=41cf6000 afsr=80400008 afva=17ef7bebbf0 tf=0x12e67bed0
data fault: pc=1000d0c addr=41cf6000 sfsr=0x0<ASI=0x0>
Skipping crash dump on recursive panic
panic: kernel fault
cpu0: Begin traceback...
cpu0: End traceback...
rebooting

I have a few spare memory modules, so will try them out one by one, if nothing helps,
it's curtains for this machine. 


Home | Main Index | Thread Index | Old Index