Subject: Re: memtestplus on amd64
To: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
From: Steven M. Bellovin <smb@cs.columbia.edu>
List: current-users
Date: 01/06/2007 12:42:34
On Sat, 6 Jan 2007 18:00:55 +0100
Hauke Fath <hauke@Espresso.Rhein-Neckar.DE> wrote:

> At 7:09 Uhr -0800 6.1.2007, paul@whooppee.com wrote:
> >If only a limited number of
> >memory errors is found, is there any (fairly painless)
> >way to make NetBSD skip that address?  It's pretty
> >painful on the wallet to replace 2GB of DDR-400
> >just because of a single error location!  :)
> 
> I had the same problem with a 16 MByte 30 pin SIMM a while back that
> has exactly one (1) sticky bit. When I googled back then, I found
> that there is a linux kernel extension for mapping out bad memory,
> but I didn't find a similar mechanism for other OSes.

Mainframes have had similar facilities for decades, settable by the
operators at run-time.  In fact, they've had it for so long it may be
meaningless as a precedent, because memory technology has changed so
much.  I *think* the feature goes back to the days of real core
storage...
> 
> OTOH, if your time is worth anything, you are probably better off by
> replacing the part, since you'll be back to square one otherwise
> whenever the box in question shows fishy behaviour.
> 
100% agreement -- but the facility is still worth having, as a way to
keep a production box (especially a production box in a remote rack)
running until someone can replace the memory.  (Yes, I know there's
always the danger of corrupted results if more memory goes bad.  There
are also many machine where it's worth taking that sort of chance.)


		--Steve Bellovin, http://www.cs.columbia.edu/~smb