Subject: Re: 256 MB RAM?
To: None <port-i386@netbsd.org>
From: Greg A. Woods <woods@most.weird.com>
List: port-i386
Date: 04/25/1999 12:33:34
[ On Sunday, April 25, 1999 at 15:48:11 (+1000), Joel Reicher wrote: ]
> Subject: Re: 256 MB RAM? 
>
> > The one minor complication is that dram errors fall into a few
> > categories. 
> > 
> > 1) totally random - eg. alpha particle hits.  Taking a page out of
> > service for one of these is a waste, but shouldn't hurt too much
> > seeing how alpha hits should be well under 1/week (month???).
> 
> FYI I think this is an extreme rarity with modern modules. IIRC it was
> trace amounts of radioactive isotopes in the packaging material which is
> much more refined now.

For alpha particles, yes -- any such particles from outside sources will
be stopped by the packaging material.  They don't usually go very far
through anything of substance -- even cigarette paper can stop the usual
low energy ones (unless it's defectively thin and has holes in it! ;-).

However DRAM devices are getting small enough to be affected by ionizing
radiation, even to the point of multi-bit errors!  Even way back in 1998
(doesn't DRAM get more dense every day? ;-) IBM showed that for ever
256MB of memory you'll get one soft error per month.  They (IBM) are now
predicting that multi-bit errors will become more of an issue,
especially for hard failures (where we already have a big problem
because chips are no longer just 1-bit wide), but increasingly for soft
errors too.  IBM's even selling 168-pin DIMMS with integrated ASICs to
do on-board ECC, but they cost 150% of "normal" prices (I'm not sure if
they were referring to normal ECC memory which just contains an extra
4-bits per word, or if they were referring to plain old non-ECC memory).

When ignorant "PC" people say things like this you can see why there's
so much dis-information going around:

   RAM chips have become much more reliable, so parity checking is now
   generally unnecessary, but if memory errors can be expensive for you,
   consider RAM with parity. We have never seen a DIMM stick with
   parity-we don't think they are available. That's a pretty good
   indicator that parity checking is no longer needed.

(from <URL:http://www.execsoft.com/tech-support/articles/art-0023.htm>)

In case you're not up on your DRAM designs, DIMM's will only have ECC
protection, never parity protection.  They're too wide, and doing byte
parity would be silly and probably even more expensive than doing ECC in
the first place.

From <URL:http://www.eetimes.com/news/98/1012news/ibm.html>:

   Perhaps IBM's biggest challenge is to convince the industry that soft
   errors caused by sub-atomic particles from outer space are a real
   problem to reckon with. Dell, who has written and lectured extensively
   on the subject, acknowledged that many DRAM producers and customers
   are reluctant to accept that soft errors pose much of a threat; the
   errors are often explained away as voltage spikes or are blamed on
   unstable software. And while DRAM manufacturers have put reliability
   testing in place to screen for alpha particles, few have any testing
   methodology to account for soft errors caused by cosmic rays, he
   noted.

(In case you don't remember from high school physics, "alpha particles"
are not "sub-atomic"! ;-)

But then again, how do you tell the difference between a soft error
caused un-clamped voltage spike on the power bus and one caused by a
cosmic ray unless you've got a digital storage scope plugged into every
DIMM all the time?  It doesn't matter what the cause though -- if you
don't find out about it, or it doesn't get corrected by the hardware,
then your long-running NetBSD system will eventually corrupt data and if
you're lucky it'll just crash.  "You pays your money, you takes your
chances."

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>