Subject: Re: No swap?
To: None <tech-embed@netbsd.org, tech-kern@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 04/08/2002 16:45:26
> 1) Fail further requests
> 2) Hang trying to find memory that doesn't exist
> 3) Kill something at random to free up memory.
> Right now the kernel does a combination of 2 and 3.

Actually, it also does 1 to some extent.

> This means that when the event occurs that clams the page, it may not
> exist.  When this happens, the operation that requested the page has
> long since finished with a successful return, so the application
> thinks it already owns the page.  So the kernel cannot simply fail
> the operation by returning a failure code.  And since nothing else is
> freeing up memory, the choices are limited.

The kernel _can_ fail the operation.  In a sense, that's what happens:
"Killed: out of swap" is a rather drastic failure mode.  I'd prefer to
convert that to a new signal, rather than SIGKILL, one which kills by
default.  Then processes that care can mlock the relevant code, handle
that signal on a (locked) signal stack, and do something appropriate.

In particular, it would help my test programs immensely: I have
programs that want to allocate memory until they run out, and then do
something (typically, free up some command-line-specified amount of
it).  Getting a SIGKILL upon running out of memory makes that somewhat
harder.

> The correct solution to this problem is simple: [don't overcommit]

Except that, as you point out, that wastes resources by allocating some
that never get used.  It's also some pretty extensive changes.

Given the possibility of a catchable signal on failure to convert COW
or ZFOD to a `real' page, and errors reflected back up the call stack
to the failing syscall for the rest, there's no reason in principle why
all allocation errors - caused by overcommit or anything else - can't
be reflected back to userland.

> So. getting back to the original point, if you really are locking up
> due to lack of pages, there is nothing that can be done because you
> simply need more memory than you have on that machin.

Well, yes; as I think I mentioned, this condition indicates a critical
overload.  But it would be nice to allow userland to recover
gracefully.

Tracking and preventing/limiting overcommit is one way.  But another
way, which involves less intrusive meddling in the current state of the
kernel, is to arrange for processes that aren't trying to allocate to
continue to run, rather than livelocking everything else out, and then
providing a way to alert a monitoring process to the situation and let
it figure out what to kill off to free up enough resources to go down
gracefully.  Basically, something like approach 3 above, except that
"randomly" doesn't apply and the decision is made by userland.

For our application, this isn't a bad match; we already have a system
component into whose demesne the task of deciding what to kill seems to
fit reasonably well.  I mentioned it here for three reasons:

- To get feedback on the ideas;
- To offer those parts of our solution that we can release to the
   community, at least a few of whom probably have to deal with a
   similar situation;
- To (attempt to) see if any of it can go back into the main tree,
   which besides potentially benefiting everyone will reduce our
   roll-forward effort in the future.

Certainly, dealing with the overcommit at the source is a better
long-term solution.  But it's also a lot more work, and we want to keep
our deltas to the main NetBSD tree minimal in both time-to-do (there's
always a deadline coming up) and code-touched (to keep our roll-forward
effort as small as possible).

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B