tech-userlevel: Re: Replacement for grep(1) (part 2)

Subject: Re: Replacement for grep(1) (part 2)
To: Daniel C. Sobral <dcs@newsguy.com>
From: Robert Elz <kre@munnari.OZ.AU>
List: tech-userlevel
Date: 07/14/1999 22:35:51
    Date:        Thu, 15 Jul 1999 00:53:17 +0900
    From:        "Daniel C. Sobral" <dcs@newsguy.com>
    Message-ID:  <378CB26D.C0BC0DBE@newsguy.com>

  | Would you care to name such systems?

munnari was one (the system of the From: header, even though this
mail isn't actually going anywhere near it).   I will describe it
a bit lower down.

  | And, btw, a system consuming
  | all memory is *not* necessarily approaching paging death.

No, of course not, though I didn't say all memory, I said all VM.
And while it is possible to have all VM consumed, and no paging activity
at all, that would tend to indicate insufficient VM allocated
(reaching an artificial barrier).

  | More
  | likely, it is just storing a lot of data in the swap which will
  | never be used (which is the whole point of overcommit in first
  | place), and, thus, never paged in.

The systems I describe were not using overcommit,  further, I wouldn't
imagine that a system storing anything to swap would be overcommiting - as
I understand the term, overcommit only relates to allocating VM resources
which aren't backed by anything physical at all ("here's all this
address space you can play in if you like, but you had better not
actually do that, because if you do it won't work").   Either applied
to one process, as that wording suggests, or aggregated over the whole
system.   If a process was (for some stupid reason) loading a whole
bunch of data into the swap space, that would be committed VM, and you
have to have the resources to cope with it.

Now to munnari.   It no longer runs quite like this, but munnari is
an alpha, 128MB, runs digital unix (not in overcommit mode, either is
possible there).   At the time of which I speak it ran two principal
applications of note, innd with a VM footprint about 100MB, and named,
with a memory footprint (at the time) of about 90MB (as it is now, it
no longer runs innd, but its named has grown to > 120MB).

It also ran a bunch of small stuff (sendmail, typically 1 or 2 instances,
around 3MB each), ftpd (smaller, most often 0 or 1, sometimes 3 or 4,)
and the occasional shell (a few hundreds of MB) plus init getty cron
syslog and all that associated noise with mem requirements approaching 0.

That's fine.  Well, not really fine, innd and named would fight each
other all day for who had how much of the real memory, and who was
relegated to swap, of which there was enough for all this to fit, but
not a lot more than that (enough for one of them to fork when it
needed to, that's all - not both at once, and yes, overcommit would
have allowed both at once, but that was not an aim).

Then, because it was running innd, it was also running the perl script
that summarises the log file, that could grow to 30MB, maybe more.

And because it is running sendmail, every now and then you get the
typical sendmail huge queue syndrome (at least for old sendmails, which
this was), where you get a dead site, a large queue of processes, and
a bunch of sendmails running the queue, spending most of their time
hung on connection attempts that aren't working, and gradually growing
bigger (maybe 8 or 10 processes at 15Mb each).

Somewhere amongst all of this swap would run out, and a good thing too,
as by this time the system really would be paging itself to oblivion.
Note that all this (large) VM I have described was filled with real data
(except for the odd times hen innd or named had just forked), none of it
could be overcommitted and just ignored.   Whatever policy was in place,
the physical VM resources would have run out.

Now let's look at what happens with the two methods.

With all VM backed by real mem or swap space, processes go about allocating
memory - when there is no more left, the allocations start failing.
If the process is perl, it just collapses in a heap, and the log file
summary doesn't get made that day.   So sad...   If its sendmail, it
issues "OS error, temporary failure" type responses, saves its queue files,
and exits.   A later sendmail will deliver those messages, no harm.
If its a shell, who knows (I forget what the shells do, I think most just
keep trying, at least if interactive), but they consume mem at such a slow
rate it doesn't matter - fork() would typically fail though, so no new
processes could get started.   innd would just pause, and wait till a
bit later when mem might be available again (those perls and sendmails
all gone away).   named just the same (at least the named munnari ran).
They're the two processes munnari was supposed to be runinng - those two
don't just die.

Now, with overcommit mode, we get an extra 30 seconds of life, because
no doubt there are a few pages floating around that have been allocated
to some process, but nothing has bothered to write into yet.   An extra 30
seconds if we're lucky (except if we followed the advice given here
earlier which would indicate that only 1/8 the amount of swap space would
be needed, in which case these processes would never have gotten started
in the first place).   After that short grace period, during which the
kernel has been happily answering requests for more VM with "sure, have
as much as you like", something needs an extra page of real storage,
there is none available, and we either deadlock, or die.   The approach
suggested here seems to find the biggest process (which here would be
innd or named) and kill -9 it.   No thanks.  Not an acceptable answer.
Sure it would get lots of VM back again, but the system would no longer
have been doing what it was supposed to be doing.   Adding more swap space
would be easy, but the wrong thing to do, that would just have allowed
the system to page itself to death, thrashing into eternity - having
processes go away is the only solution to this kind of problem.   Except
it needs to be the right processes, and "right" does not equal "big",
nor any other criteria the kernel could possibly figure out for itself.

kre