Subject: Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
To: Charles M. Hannum <root@ihack.net>
From: Matthew Dillon <dillon@apollo.backplane.com>
List: tech-userlevel
Date: 07/13/1999 18:32:36
:
:>     Has your simulation ever been kicked by the kernel due to lack of
:>     swap space?
:
:I already said so.  Please at least pretend to read what I wrote
:before replying.
:
:There is a big difference here between a piddling web server and a
:1000-hour simulation.  If the web server goes down, you reboot it,
:maybe a few users are inconvenienced meanwhile, and maybe you lose
:some advertising revenue.  If the simulation has to be restarted,
:you've lost *valuable* computing time that is not easy to replace.
:
:There are many environments where even the possibility of the
:simulation crashing due to external influence is unacceptable.  I find
:it sad that you resist making FreeBSD robust against such problems,
:but that's your concern.

    Sigh.  If the simulation is so important to you and your system does
    not have sufficient swap, maybe you should consider fixing your system 
    rather then blaming the people who wrote it.  Or perhaps you should
    consider checkpointing the code if you aren't willing to look for
    easy solutions to the problem.  Unless all the users on the system are
    working against you, no one user with a runaway should be able to run
    a properly configured system out of swap by accident.  If your users
    are doing it on purpose then maybe you should find a different machine 
    to work on, eh?  

    In a cooperative environment it is extremely easy to prevent accidental
    runaways from eating a system's swap up, and still fairly easy to reduce
    the damage done by purposeful attacks.   In fact, at BEST we set soft 
    limits for most of the system resources to reasonable enough values that
    users don't need to change them and that has protected 25 machines and
    30,000 users for several years.

    If you want help in fixing your system, we can talk over private email.

    If you are looking for a magical overcommit solution you are going to
    be looking for a long time.  It isn't going to happen, because I doubt
    it would even come close to fixing your problems even if it were 
    available.

    If you are looking to blame overcommits for your problems, then lay out
    how your system is setup.  But I'll bet you the problem is something 
    less severe -- like a simple misconfiguration, or perhaps insufficient
    swap.  How much swap is on this system, by the way?

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>