Subject: Re: ffs panic with 1.5C (19/11/2000)
To: Bernd Sieker <bsieker@freenet.de>
From: Luke Mewburn <lukem@cs.rmit.edu.au>
List: tech-kern
Date: 11/07/2000 13:11:31
Bernd Sieker writes:
> On 07.11.00, 00:35:01, Darren Reed wrote:
> > blah, I was going to ask if anyone knew of a good dos based memory
> > testing utility.
> 
> Here's something even better. Stand-alone, i. e. boots on the bare
> iron (or rather, the BIOS), and tests _all_ the memory quite
> thoroughly, with caches on/off, with different refresh timings and
> different bit patterns.
> 
> http://reality.sgi.com/cbrady_denver/memtest86/

As I said in my earlier post, this program did not find any problem
with bad memory in two separate systems of mine. One had 2 x 256MB
PC133 DIMMs + 2 x 128MB PC100 DIMMs (only needs PC100, and by trial
and error I determined that one of the 256MB DIMMs is faulty), the
other had a single 128MB PC133 DIMM, and that turned out to be faulty
as well.

In both cases, memtest86 ran for over 12 hours, and didn't find a
problem. I found the problem with various tests such as:
	* running bonnie in multiuser mode on an IDE raidframe set
	  would panic (single user mode wouldn't)
	* IE5.0 would crash under W2K. IE5.5 became a bit more stable,
	  but Half Life would crash about 30% of the time when
	  loading or saving a `saved' game.
	* Kernel compiles would Sig 11 a lot.

My (probably incorrect) gut feel is that various memory issues are
often not found by a simple `walk the RAM' issue, and it's only when
you start doing other stuff that is `more real world' (including
possibly doing DMA to/from a device) that will trigger the fault.

I don't know how the commercial hardware RAM testers compare in such
situations.