tech-kern: Re: ffs panic with 1.5C (19/11/2000)

Subject: Re: ffs panic with 1.5C (19/11/2000)
To: None <lukem@cs.rmit.edu.au>
From: Brett Lymn <blymn@baesystems.com.au>
List: tech-kern
Date: 11/11/2000 14:36:14

According to Luke Mewburn:
>
>My (probably incorrect) gut feel is that various memory issues are
>often not found by a simple `walk the RAM' issue,
>

No you are quite correct there.  I once did some memory testing progs
and know that it is inordinately difficult to write a good one - one
that picks up subtle memory errors all sorts of things make it
difficult.  Probably the worst is parasitic capacitance on the bus
lines which can make a dead data line look like it is working if you
perform the write and read fast enough.  A dud address line can cause
you to read and write the same location, if you just fill with the
same data you never spot it.  Just about any pattern you use (walking
ones, walking zeros, 55's, EE's and so on) can fail to pick up one
sort of fault or another.  Loading the memory with random data and
doing random reads can be effective but making sure you read every
byte is a bear.  Doing a combination of lots of different usage
patterns should show up most faults.

> and it's only when
>you start doing other stuff that is `more real world' (including
>possibly doing DMA to/from a device) that will trigger the fault.
>

Yes, random data being written to and read from random locations :-)

Once upon a time I did have an algorithm that was supposed to
thoroughly test out DRAM, it was supposed to be based on the device
characteristics so it could pick up DRAM type faults.  I can try and
dig up a copy if anyone is interested.

-- 
===============================================================================
Brett Lymn, Computer Systems Administrator, BAE SYSTEMS
===============================================================================