Subject: Scheduler hints (Re: CVS commit: syssrc/sys/kern)
To: Jaromir Dolecek <jdolecek@netbsd.org>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 12/07/2002 12:59:13
For a system on the edge of collapse where the swarm of processes
are trying to do work (ie, user cpu time) as well as fork(), I'm
not sure that any given process would get much cpu anyway if it
hadn't been put to sleep. The swarm of others would steal it all.

It's only the ones that fork (trying to create more work for the
overloaded system) that get put to sleep.  If they're forkmonsters
that do nothing else, they all wind up sleeping.  If they do other
stuff (even if it's pointlessly burn cpu as wabbit variants) then
they can still do that, as has been shown in this thread.  Maybe
they can even do it better if the system is not spending time
processing fork()s that will fail anyway.

Cases like the webserver and smtp processes doing overload work don't
entirely convince me, given that these processes will be doing
lots of blocking syscalls, and thus sleeping for considerable times
before getting rescheduled regardless.

People have suggested that a better fix might be in the scheduler.
Perhaps so. Lets look at this .5s tsleep as a "scheduler hint" for
the moment - perhaps a rather blunt Hint Of Arbitrary Size, but
nonetheless. It says "this process is potentially causing problems,
and I want the scheduler to penalise it for a bit".

I'm sure we can refine the hinting mechanism, and perhaps also the
trigger conditions. Certainly I think a sysctl to tune the sleep
time (even to 0) is a good idea, since it is an arbitrary value
it should be tunable.

Other hints of various forms that might be considered:
 - making the size of the penalty depend on some other aspect of
   the situation, like whether the fork that fails is for the user
   with the most processes, or someone else.
 - moving the offender to the end of the run queue (from christos)
 - tweaking some other scheduler variable within the present
   algorithm, like estcpu or nice.
 - adding some new "penalty" variable, that might be used generally
   to impose penalties in some other cases too, and that decays
   away as the process gets run later without triggering further
   problems.

Schedulers are subtle, highly optimised things.  If there's a way
to improve ours,  by all means lets do so with due caution.

One of the things that's attractive about the present "hint"
(whatever its flaws or controversies) is that it doesn't add any
complexity to the scheduler.  I'd like to see what we can do with
tuning the present hint before thinking about adding any more.

Time and real-world experience with the change are required, to
investigate some of the theorising here and see where it really
helps and hurts.  FreeBSD's experience seems to be positive so far,
for starters..

(And, yes, you can always design a tailor-made stealth attack wabbit
that looks to the scheduler like a "real" process, if you try to make
it recognise them)

--
Dan.