Subject: Re: Fork bomb protection patch
To: None <dyoung@pobox.com>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 12/07/2002 01:00:32
On Fri, Dec 06, 2002 at 08:59:15PM -0600, David Young wrote:
> 
> Thor,
> 
> I do not see how rlimits are sufficient to stop a fork bomb. RLIMIT_CPU
> applies to p_rtime, which a child does not inherit from its parent, so
> every fork() gives a fork bomb a new lease on life.  Also, RLIMIT_NPROC
> does not seem like sufficient protection: a bomb can use more than its
> "fair share" of resources without ever exceeding RLIMIT_NPROC. What am
> I missing?

What you're missing is that the cpu time limit prevents any single process
from using more than a given amount of CPU time -- whether looping around
a system call that fails or not -- and the processes limit will prevent
any user from creating more than a given number of processes.

Mind you, if one instance of the forkmonster dies because it's managed to
run up to its CPU time limit, another will manage to fork -- once.  But the
CPU time limit is useful for preventing runaway processes from looping around
system calls that do not spawn children, which is hardly uncommon.

In practice, users on timesharing systems seldom need more than 10 or 15
simultaneous processes -- in fact, often less than that number will
suffice.  Even with the default per-user process limit -- which, I would
contend, is set far too high at 160; it never used to be that high and
I do not believe that it should be that high now -- it's possible for the
superuser to clean things up easily enough, though the system will slow to
a crawl while he's doing so.  The ultimate solution lies in cleaning up the
users in question, if they're malicious or stupid enough to keep running
forkmonsters.

I would support lowering the default per-user process limit back to 80; I
suspect it was raised for the sake of convenience for those who run huge
numbers of server processes under a single UID (e.g. apache) but don't
understand how to raise the resource limit, which was a silly reason if
it's why it was done.  Certainly it was a mistake to raise the *per-user*
limit but not the *system-wide* limit, as each user can now consume twice
as many slots in a table of limited size.

Furthermore, I suggest that the incompetence of the average modern system 
administrator and his unwillingness to consider the appropriate resource 
limits for users of his system as well as to learn or build the simple tools 
he needs to deal with users who misbehave are really no good reason to do
dumb things like make system calls mysteriously sleep for half a second to
deal with broken or evil programs that loop around them.

This is an age-old problem, and it's easily addressed well enough with age-
old tools.  If we want to solve it _better_, fine -- but the solution that
was hastily committed to our tree is not an example of "better", it's an
example of "paniced misunderstanding of a well-known problem".

Thor