Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Memory/data errors



On Mon, 19 Mar 2018 06:22:25 +1100
matthew green <mrg%eterna.com.au@localhost> wrote:

> Sad Clouds writes:
> > Hello, I've been seeing various errors and kernel hangs with
> > NetBSD-8 on Sun Ultra 10. This is a rather old machine, so I'm
> > assuming hardware is starting to fail, etc. I did run max
> > diagnostics at openboot and it didn't find any issues.
> > 
> > Normally, I would run NetBSD build.sh and sooner or later GCC would
> > segfault, or kernel would hang. I was also seeing the following
> > errors logged:
> > 
> > Mar 15 17:53:42 ultra10 /netbsd: data error type 32 sfsr=0
> > sfva=425de020 afsr=400008 afva=17ff7b6fbf8 tf=0x1186c7ed0
> > 
> > So I upgraded to latest snapshot and changed from GENERIC to
> > GENERIC.UP and it seems much more stable now. Not seen any kernel
> > hangs for a few hours, but I'm still running build.sh and it's too
> > early to celebrate.
> > 
> > A few questions though, has anyone noticed anything similar when
> > running GENERIC? Could these be some race conditions which are not
> > present in uniprocessor kernel?
> 
> my ultra10s got this disease.  i still run one of them and it
> occasinally hangs when idle or busy.  one of the problems with
> the ultra10 is that when we try to 'sir' to recover from some
> types of fatal error it hangs instead of resets.  i never got
> around to seeing if we do something to cause it.
> 
> i haven't seen any issues i'd relate to GENERIC vs UP, but this
> is an interesting point.  please let us know if this stays up.
> 
> 
> .mrg.

Well, so far with the latest GENERIC.UP my Ultra 10 has been running
build.sh all day today without a single issue.

When I was running GENERIC from October 2017, then within an hour,
process would crash or kernel would hang. I suspected hardware issues,
as I had quite a few bulging capacitors on the mainboard with dielectric
leaking out. So I got a soldering iron and replaced all capacitors,
also replaced power supply, just in case. This didn't seem to help, I
was still getting kernel hangs, until I booted the latest GENERIC.UP
kernel.

I'll be doing some more stress testing, but looks like GENERIC kernel
might have been the culprit.


Home | Main Index | Thread Index | Old Index