current-users: Re: deadbeee ?

Subject: Re: deadbeee ?
To: None <current-users@NetBSD.org>
From: Rafal Boni <rafal@pobox.com>
List: current-users
Date: 01/22/2004 13:15:46
In message <22550.198.26.125.13.1074722240.squirrel@www.cynjut.net>, Dave
writes:

-> <quote who="Matthias Scheler">
-> > In article <20040120164606.GA166@quartz.newn.cam.ac.uk>,
-> > 	Patrick Welche <prlw1@newn.cam.ac.uk> writes:
-> [...elided...]
-> >>
-> >> and all froze. Does that say "memory is going bad"?
-> >
-> > Not necesarrily. The kernel could have done something like "foo->bar--" on
-> > the overwritten memory location. But it could of cause also be a hardware
-> > problem. A good test is to compile something big (e.g. KDE 3). If your
-> > computer survives that without a segmentation fault from gcc the memory
-> > is probably ok.
-> 
-> I've tried that in the past; it works OK.  A better test it to build and
-> use the "memtest" boot disk from pkgsrc.
-> 
-> I built one a few weeks ago, and found out why three of my servers were
-> randomly rebooting/freezing in ways that look alarmingly like this.  I
-> also threw away three sticks of RAM that were 'only bad in one spot'.

Actually, my experience says that doing both is preferrable.  I've
had machines which passed memtest but where the "rebuild large piece
of software in a loop" test would fail with gcc dying with random SEGVs.
gcc does a pretty good job of stressing a machine's memory and some-
times it helps to have the machine be doing cpu-intensive work while
under test (for example, thermal issues, etc.)

I've also had machines where the gcc test seemed OK and memtest did
turn up issues, so now when I get suspicious behaviour I try first
the gcc test then crank up memtest.

--rafal

----
Rafal Boni                                                     rafal@pobox.com
  We are all worms.  But I do believe I am a glowworm.  -- Winston Churchill