Subject: Re: help me analyze my servers failure
To: port-mac68k <port-mac68k@netbsd.org>
From: Bob Nestor <rnestor@augustmail.com>
List: port-mac68k
Date: 04/09/2001 19:39:35
josh@ssimr.com

>On Mon, Apr 09, 2001 at 04:42:09PM -0700, Cameron Kaiser wrote:
>> > What happens is I can't get in, nor can any clients connect to access
>> > the services. It is still up on my network (judging from port-scans
>> > done with agnet tools from another Mac on my network). I have been
>> > running it headless, but when I stick the monitor back on - even
>> > though I can still get a display - I can't get a keyboard response. I
>> > find out either when I try to telnet into the server, or someone
>> > trying to check their mail in the house or send mail finds they can't.
>> 
>> There's not enough information here.
>
>That is correct. 
>
>Bob Nestor suggested Im running out of swap space. I mention this here
>because I have the same problem testing that as I will evertyhing
>else. Which is everything is very clean right now. If Bob is right,
>then I have an application slowly leaking memory. I'm running bind,
>sendmail, gnu-pop3 daemon from compiled source. The apache daemon is
>from a package at install time and the telnet daemon was done with the
>install. 

When I had the problem I discovered that things like telnet didn't work 
because the daemons had been killed as part of the effort by the kernel 
to recover from lack of SWAP space.  Basically when the system runs out 
of SWAP space it starts shedding processes in an attempt to free up SWAP. 
 Unfortunately it seems to get the low numbered processes first, like the 
init process "1".

I found the real culprit by leaving an open session running on the 
console that would hopefully survive the process killer when the system 
locked up.  You might try this and/or leaving top run on the console to 
see if you can determine the real state of the system when it locks up.

-bob