Subject: Re: No swap?
To: None <mouse@Rodents.Montreal.QC.CA, tech-embed@netbsd.org, tech-kern@netbsd.org>
From: None <eeh@netbsd.org>
List: tech-embed
Date: 04/08/2002 19:41:36
| I'm working with a proprietary PowerPC port, presently based on 1.5W.
| We've been running into problems that appear to stem from the kernel's
| misbehaving when it runs out of swap - in our case, we have no swap, so
| we're "out of swap" right from the start.

One word:  Overcommit.

I'm sorry if I'm repeating something everyone else is familliar with, but
this a well known problem.  When you run out of memory you have basically
3 options:

1) Fail further requests

2) Hang trying to find memory that doesn't exist

3) Kill something at random to free up memory.

Right now the kernel does a combination of 2 and 3.

Because of the way we rely on COW and ZFOD for better performance,
the kernel often promises to provide pages that may be required in
the future, but does not actually allocate them until they are needed.
This means that when the event occurs that clams the page, it may not
exist.  When this happens, the operation that requested the page has
long since finished with a successful return, so the application thinks
it already owns the page.  So the kernel cannot simply fail the operation
by returning a failure code.  And since nothing else is freeing up memory,
the choices are limited.

The correct solution to this problem is simple:

1) Track all operations that cause the kernel to commit itself to provide
pages in future.

2) Fail any operations that cause the kernel to enter a state of
overcommitment.

3) Change userland coding practices not to use methods that require the
commitment of large number of pages: e.g. use vfork()/exec() instead of
fork()/exec(), and mprotect() pages read-only that you don't want to
COW or ZFOD later.

Now there are a few operational issues involved.  In practice it is not
possible to prevent the kernel from entering a state of overcommit, nor
is it desirable.  You just want to limit how how deep into the hole the
kernel can get.

Since the kernel may need to allocate pages for internal use and some
page fault handling operations may generate COW or ZFOD mappings, you
can't really prevent the kernel from getting itself into an overcommit
state.  In most cases many of the promised pages will never be claimed.
For instance, you almost never use more than a small fraction of the
stack space allowed by your stack limit.  And while shared libraries on
some architectures require writing to the text segment, that is usually
a very smal fraction of the total program text.  I have done some small
amount of testing and a ballpark estimate is that 20% of memory is 
allocated but never claimed.

So. getting back to the original point, if you really are locking up
due to lack of pages, there is nothing that can be done because you 
simply need more memory than you have on that machin.  However, if
you want to handle that situation gracefully, you need to track and
limit overcommit.

Eduardo