Subject: Re: Two troubling problems (CDR and and some bad luck)
To: Dave Burgess <burgess@neonramp.com>
From: Chuck Silvers <chuq@chuq.com>
List: current-users
Date: 04/11/1999 23:05:09
Dave Burgess writes:
> 2)  The other little problem is mostly a matter of bad luck.  One of my
> SCSI drives on my Web Server developed a bad spot in the swap space.
> Whenever the system hit it, it would generate a bunch of errors and
> eventually kill the machine.  While I'd rather not, I can probably
> reproduce the error, given enough time.  I was in kind of a huge hurry to
> get the system back on-line, so I didn't bother to note the error.  The fix
> was to simply add another drive and move the sd1b to another spot on the
> disk.  The question (if it is one) is "Has anyone looked at the swap code
> WRT error recovery in case of a medium error?"  I doubt that this requires
> a PR, it's just kind of a general poser.

I have written code to deal with swap errors thusly:
  1.  on write errors, mark the region of swap with the error as
      bad (just in memory, not persistently) and avoid accessing
      that region of swap again.
  2.  on read errors, kill the process that's faulting on the swap page
      (since its address space is now corrupted), or for /dev/drum access
      just fail the read().

these changes are awaiting review and will hopefully be included in
netbsd 1.5 (or whatever the one after 1.4 is called).

-Chuck