Subject: Re: userid partitioned swap spaces.
To: NetBSD Kernel Technical Discussion List <tech-kern@netbsd.org>
From: Greg A. Woods <woods@most.weird.com>
List: tech-kern
Date: 12/18/1998 12:02:06
[ On Fri, December 18, 1998 at 11:49:24 (+1030), Ian Dall wrote: ]
> Subject: Re: userid partitioned swap spaces.
>
> I think people have the view that if they do a malloc and check for
> NULL then they have "done the right thing" and that their process should
> be safe.

Normally that *should* be the case....

> Unfortunately it is not as simple as that. Stack space for example can
> always grow and there is no mechanism for a process to be able to
> detect and handle inability to allocate stack space. Yes, you could
> always allocate enough swap to cover the maximum possible stack size,
> but it gets horribly conservative. The single user box I am currently
> typing this on has 37 processes and 8MB per process stack limit. Maybe
> I am stingy, but the idea of 300MB swap just for the stack allocations
> seems excessive to me! It gets worse when you allow for pages which
> may be shared many times COW and userland threads which need a stack
> for each thread. Variable length arrays in C9x are likely to
> exacerbate the problem.

System limits are are a true upper limit.  They are not really meant to
be a guideline for resource allocation, especially if they're assigned
globally to all processes in the system.  Resource allocation, such as
deciding how much swap space to reserve for the entire system, must be
done based on the maximum amount of space that one guesses the average
mix of processes will require.

In some circumstances one can indeed use system limits as a better
approximation for resource allocation if those limits are carefully
assigned to different groups of users based perhaps on the degree of
trust one assigns to that group.  FreeBSD (and I think BSD/OS) has
implemented this scheme with the "login class" field and
/etc/login.conf.

Also note that a process which runs out of stack space either because it
has exceeded the maximum limit assigned to it, or simply because there's
no more swap space available, must (at least in the Unix way of doing
things) be sent some kind of signal, and whether or not the runtime
system tries to help the process keeps running by re-arranging stack use
or sleeping or something; or wehter it is simply killed, depends on,
well, on whether or not the program was programmed to handle this kind
of "exception" or not.

> Finally, it seems to me the goal is to prevent accidental or
> deliberate DoS by consuming swap. Merely preventing overcommit does
> mean that processes don't get killed arbitrarilly, but it doesn't
> prevent DoS by consuming swap because if nothing new can run, even a
> root login or top, ps or kill, then the system is pretty irretrievably
> wedged anyway.

Well, actually I'd be happy if the system could simply recover from such
a DoS attack -- I don't really care to prevent them (that takes away too
much of my rope!).

> I didn't specify, but my scheme needs to be some mechanism for firing
> off a process, or waking up an existing process when swap reaches the
> high water mark.

Regardless of how agressively the system tries to recover from a VM
over-commit situation, there should still be a high-water mark that can
be set far enough below the physical limit to allow the superuser to
have a wee bit of room to work with *IF* indeed there's a superuser
handy to do such things....

(And no, there shouldn't be any way to assign the space above the limit
to some other user-id or group-id.  This is unix, with one, and only
one, superuser ID; not multics.)

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>