Subject: Re: help me analyze my servers failure
To: port-mac68k <port-mac68k@netbsd.org>
From: None <josh@ssimr.com>
List: port-mac68k
Date: 04/13/2001 21:31:58
I'm trying to follow through to analyze what made my server fail. It's
been up for about a week now. I keep a top window running on a telnet
window from another machine on the network as well as e-mailing the
output of top and swapctl to a few addresses.

As far as I can tell

1. My machine uses between 40 and 46 of the 68M available.

2. Never touches swap

3. runs about 25 to 30 processes

and is stable as a rock while I watch it. 

I'm starting to wonder if there weren't some external factors.

On Mon, Apr 09, 2001 at 07:39:35PM -0500, Bob Nestor wrote:
> josh@ssimr.com
> 
> >On Mon, Apr 09, 2001 at 04:42:09PM -0700, Cameron Kaiser wrote:
> >> > What happens is I can't get in, nor can any clients connect to access
> >> > the services. It is still up on my network (judging from port-scans
> >> > done with agnet tools from another Mac on my network). I have been
> >> > running it headless, but when I stick the monitor back on - even
> >> > though I can still get a display - I can't get a keyboard response. I
> >> > find out either when I try to telnet into the server, or someone
> >> > trying to check their mail in the house or send mail finds they can't.
> >> 
> >> There's not enough information here.
> >
> >That is correct. 
> >
> >Bob Nestor suggested Im running out of swap space. I mention this here
> >because I have the same problem testing that as I will evertyhing
> >else. Which is everything is very clean right now. If Bob is right,
> >then I have an application slowly leaking memory. I'm running bind,
> >sendmail, gnu-pop3 daemon from compiled source. The apache daemon is
> >from a package at install time and the telnet daemon was done with the
> >install. 
> 
> When I had the problem I discovered that things like telnet didn't work 
> because the daemons had been killed as part of the effort by the kernel 
> to recover from lack of SWAP space.  Basically when the system runs out 
> of SWAP space it starts shedding processes in an attempt to free up SWAP. 
>  Unfortunately it seems to get the low numbered processes first, like the 
> init process "1".
> 
> I found the real culprit by leaving an open session running on the 
> console that would hopefully survive the process killer when the system 
> locked up.  You might try this and/or leaving top run on the console to 
> see if you can determine the real state of the system when it locks up.
> 
> -bob

-- 
Josh Kuperman
josh@ssimr.com
http://www.ssimr.com