Subject: Stopping a runaway VAX
To: None <port-vax@netbsd.org>
From: John Klos <john@sixgirls.org>
List: port-vax
Date: 08/09/2001 03:05:53
Hello, all,

I did something silly recently, and I wanted to know how to prevent the
predicament I put myself in.

I have some stats programs that take up to half an hour to run, so
I run one program an hour. However, I added these to the cronjob
improperly, and ended up spawning another Perl interpreter every minute.

By the time I realised my mistake, it was the next morning, and I bet the
load average of the machine was in the hundreds. Since I don't use telnet,
my only access is through ssh, which didn't work in spite of leaving the
login for several hours to see if it would eventually get around to it
(perhaps rebuilding the ssh keys every hour was never getting completed,
too).

Anyway, I tried scp'ing some files in place of the Perl scripts, but that
didn't work (the scp jobs eventually died having never transferred
anything), but the web server continued to work throughout.

So my question is this: what is the simplest, most elegant way to have a
system task run when the load average gets too high? Also, is there an
easy way, if any, to have cron not run a job if the last instance of that
job is still running? Or would a little external logic be in order?

Finally, on REALLY slow machines (such as ours), is there a way to set ssh
to rebuild keys only if the load average isn't too high? If not, there
should be.

Ideas? Thoughts? Comments?

BTW - I eventually had to power cycle the machine. Now I have a tty
running on the serial port in case something like this happens again.

Thanks,
John Klos