Subject: Re: panic: all bets are off...
To: Erik E. Fair <fair@clock.org>
From: Jeremy Cooper <jeremy@broder.com>
List: port-sun3
Date: 05/26/1998 19:45:44
On Tue, 26 May 1998, Erik E. Fair wrote:

> How hard would it be to:
> 
> 1. identify the physical address that caused the trap
> 2. clear the trap
> 3. figure out if the address is user space or kernel space
> 4. if kernel, panic.
> 5. if user, kill the process, zap the contents of that page with zeros to
> reset parity (should only get a parity error on a read, after all)

To recover from a parity error in the fasion you have suggested above
sounds nice, but it is complicated due the third step - identifying
whether the address at which the parity error occurred was a user space or
kernel space address.  To understand why, you must remember that user
space and kernel space are virtual memory concepts, not physical ones.  A
physical page of memory may reside simultaneously within the kernel
space and within the space of a user process.  The virtual memory system
not only allows this, but uses it as much as possible to its advantage.

Consider the case where a user-process has memory mapped a file.  What
happens if a parity error is detected in the physical memory where this
mapping happens to be backed?  Do all the changes that the user
application has made to this page become invalidated and lost?  Sure, that
is acceptable you might say.  But what if the OS has a proper merged
buffer cache system - that is, what if that page was the only copy of the
file's true contents?  Does the user app get killed and the file's
contents lost forever without a panic?  Not panicing in that case would be
wrong as the system has now done something unexpected and allowed an
inconsistancy to pervade the filesystem.

It's the little sub-cases like the one above that make it hard to divide
parity errors into important and not-so-important categories.  The problem
can't be solved by determining whether an error occurs in the kernel or in
user space.  Parity errors will remain simply devestating wherever they
occur so long as we cannot label which bytes in memory are important and
which are not.

-J