Subject: Re: CVS commit: syssrc/sys/kern
To: Jaromir Dolecek <jdolecek@netbsd.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/09/2002 11:24:30
On Sun, 8 Dec 2002, Jaromir Dolecek wrote:

> Bill Studenmund wrote:
> > Well, that seems like part of the problem. You have a view of what is
> > proper behavior, and part of that seems to be that running into the
> > process limit is punishable; if you're running into the limit, you're
> > mis-behaving.
>
> Yes, this more or less my view.
>
> Definitely process running into it's limits is more suitable for
> punishment than random other process. I believe that the limits
> are supposed to be set so that the users/system are able to do
> their work, but still catch runaway cases.  So if the limit _is_
> reached, it's IMHO fine to use drastic measures.

Why do you need drastic measures? Just deny the fork.

The problem is that running into the limit doesn't mean you're a runaway
process. A big and busy one, but not a runaway.

I can see hitting the max we're in one of three cases:

1) A busy, non-malicious daemon that doesn't/can't keep track of its
children (and more importantly grandchildren, etc.) and so lets the kernel
do it.

2) A poorly-written runaway (say student project).

3) A malicious DoS program

How do you tell them apart? If you could apply the 0.5 second delay to
only cases 2 & 3, then that'd be fine. But you can't.

> There is some prior art, even. During out-of-memory condition,
> processes asking for more memory are killed.  Processes not asking
> for more memory can continue running happily.  Similarily CPU
> limits, process is terminated if it reaches the limit.

Come on. Those aren't comparable.

A program can keep running if it can't fork. It CAN'T keep running if it
hits its CPU limits (that's kinda the point of CPU limits, after all). So
that comparison's not appropriate.

As for memory exhaustion, our current behavior is very simplistic. My
understanding is it dates from a time when programs didn't deal well with
memory shortage, and so denying the memory would cause a crash. So killing
it 1) has the same effect, 2) frees more memory more quickly than waiting
for the program to die.

Now I'm not saying the above is what we should keep doing; we probably
should change it. But it certainly isn't a good rational for other
behaviors.

> Process slots are not that scarce. The limit for number of
> processes is reached very seldomly. When it _is_ reached, it
> is very likely that Something isn't behaving properly; either

No, it's not. And that's what you're not getting. While it may be that
something's wrong, it's not certain, and under a number of circumstances,
not likely.

These limits are and have been non-punitive. To change that is a big
change, and one that the fact you've had to guesses on things makes seems
poorly-thought out.

> there is some Unexpected Load, DoS going on, silly mistake
> of local user, or something is misconfigured. In all these
> cases, the induced sleep helps administrator to get things under
> control more easily.

How do you know that the administrator is the best person to deal with
Unexpected Load, as you put it? For a busy server running right where it's
supposed to (i.e. at its limit), we aren't in a, "Admin-god needs to come
and save us," situation.

> There are no mysterious failures caused by the sleep. If the

How do you know? What research have you done? Up until now, you've spoken
from opinion, which isn't good enough for this change.

> out-of-slots condition passes, all system activity goes to normal
> shortly. If the out-of-slots condition continues, the processes
> most likely causing trouble (those forking) are punished. Maybe
> it's not quite ideal behaviour, but it's quite a good approximation
> and has zero overhead cost.
>
> > I think it's perfectly fine to fork until you can't. The kernel has to
> > keep track of your process limit, and can politely tell you you've hit it.
> > Why duplicate that in userland? Also, by doing that, if the kernel limit
> > is ever changed, you immediately can take advantage of it.
>
> Yes, I think it's perfectly fine to do that as well. Just don't
> expect to do that AND get the resources immediatelly all the time.

Why not? Until your change, it'd been a fine thing to do FOR DECADES.

Also, it's one thing to block waiting for a process slot to open up, but
that's not what you've done. The process waits for an absolute amout of
time.

You've made an unconfigurable change that impacts what has been acceptable
behavior FOR DECADES. Either revert it, or sysctl it with a default of
zero. Now.

Take care,

Bill