tech-kern: Re: No swap?

Subject: Re: No swap?
To: None <mmondor@gobot.ca>
From: Roland Dowdeswell <elric@imrryr.org>
List: tech-kern
Date: 04/14/2002 02:15:18
On 1018666879 seconds since the Beginning of the UNIX epoch
Matthew Mondor wrote:
>
>Is there any important particular reason why the kernel should always
>guarentee to provide more memory to a demanding process (or force it
>to sleep until it gets some), rather than immediately reply that the
>request could not be fulfilled? (memory and swap temporarily full)?

The problem is that in the allocation model that we use, there are
a lot of memory allocations that are delayed until referenced on
the presumption that many of them will never actually be needed.

So, if you consider a process like /bin/sh doing a command

	$ ( ls )

It is guaranteed that the original shell will fork(2) and then run
ls(1) which will require an additional fork(2).  So, before ls(1)
is execve(2)ed, there are three copies of the original shell in
memory: the interactive one, the subshell and the pre-execve(2)
child.  Now, since fork(2) uses Copy-on-Write semantics, the pages
of the original shell are not actually copied but rather marked to
be copied by the children upon changes.  In all likelihood (in this
example) a great number of those COW pages will not actually be
changed and therefore will not be copied.

But, the interesting thing to note is that as far as the kernel is
concerned the allocation event is the modification of page and
since the application doesn't actually know that it is causing an
allocation event it is unlikely to be prepared for an error code.
In fact, since the modification may just be the setting of a
variable, there isn't a general way to return an appropriate error.

Prior emails on this thread mentioned that a potential solution
would be to prevent overcommit, that is ensure that there is enough
memory to satisfy all of the copy-on-write mappings in the event
that they were all changed.  In the past, some people have argued
that this would be unduly restrictive based on the presumption that
a large amount of those pages will never be allocated.  Some people
even proposed that a prevention of overcommit would make running
without swap unfeasible and require enormous amounts of basically
unused swap.

I think that the last time that this discussion came up, I suggested
that there is a middle-ground approach were the kernel would use
a simple heuristic to not allow more pages than are likely to be
used to be COW-allocated.  For that, you'd just come up with a
relatively conservative number n for the chance that any particular
COW mapping would be hit and count that page as n pages rather than
1 page in the calculation for how much memory is committed.  If
the n were tunable via sysctl(8), this solution would be both more
general than a strict commit policy and would `probably' never
leave one in the overcommit situation.  The heuristic might even
change n based on the age of the COW mappings, but I'd have to run
some simulations to see which way one would be likely to go with
that one.

>It would then consist of the process's responsibility (as well as libc)
>to ensure to properly check error conditions and act appropriately,
>the kernel would never find itself in any unpredictable state... Of
>course badly written software assuming that memory is always available
>would segfault as expected trying to access unallocated memory...
>libc standards always define malloc() error conditions, which it could
>report to the user program when the kernel failed to provide the
>requested resource immediately, which a program should take into
>account
>
>Thanks
>Matt
>

 == Roland Dowdeswell                      http://www.Imrryr.ORG/~elric/  ==
 == The Unofficial NetBSD Web Pages        http://www.Imrryr.ORG/NetBSD/  ==
 == The NetBSD Project                            http://www.NetBSD.ORG/  ==