Subject: Re: 256 MB RAM?
To: None <port-i386@NetBSD.ORG>
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
List: port-i386
Date: 04/21/1999 11:10:27
explorer@flame.org (Michael Graff) writes:
> Wolfgang Rupprecht <wolfgang@wsrcc.com> writes:
> 
> > SunOS 4.x had an interesting approach to parity errors.  1) If the
> > page was unmodified and backed by some disk it would just get it from
> > disk.  2) If it was uncorrectable it would kill only that process.
> 
> I assume it would also mark that page as bad, and not use it anymore,
> too?

Its been a while, I can't recall what they did, but it might be a
smart think to do if some modern OS took up that ball and ran with it.

The one minor complication is that dram errors fall into a few
categories. 

1) totally random - eg. alpha particle hits.  Taking a page out of
service for one of these is a waste, but shouldn't hurt too much
seeing how alpha hits should be well under 1/week (month???).

2) weak dram row.  Everything in that ram page (eg. same addr &
0x0ff?00) is weak.  Taking out the physical page (and/or the ones
around it) will help solve the immediate problem.

3) weak dram column.  Everything with the same column address
(eg. same addr & 0x000?ff) is weak.  This error requires taking out
every address that maps onto a certain column.  Not really doable with
current mmu's.

Modern drams are actually not simple row/column designs any more (but
replicated versions of the simple design, so its likely that only the
row or column in one of the replicated sections is bad.  I've been out
of the hardware design for a while so I'm not 100% sure how this
failure maps onto physical addressing, but I'd suspect it would show
up as bad bits that had a larger stride through memory.  Again, not
really fixable.

-wolfgang
-- 
       Wolfgang Rupprecht <wolfgang+gnus@dailyplanet.wsrcc.com>
		    http://www.wsrcc.com/wolfgang/
DGPS signals via the Internet  http://www.wsrcc.com/wolfgang/gps/dgps-ip.html