Subject: Re: Web load causes reboot
To: Frank van der Linden <frank@wins.uva.nl>
From: Dave Burgess <burgess@cynjut.neonramp.com>
List: current-users
Date: 08/03/1999 17:35:47
> 
> On Tue, Aug 03, 1999 at 12:59:56PM -0700, Len Burns wrote:
> > Indeed it is.  I can reproduce the same behavior here using a 1.4.1
> > kernel on i386.  
> 
> Could either of you give some more data, like the configurations
> of your system, the type of network that you're using, the client
> that the Apache bench program is run on (i.e. how fast is it), and
> the size of the file that is being transferred with each request?
> And perhaps the types of ethernet card that you are using.

I'm running into the same problem on mail2. 

My configuration is a Web Server running 1.4 with about 200 virtual
hosts.  I'm running GENERIC+"options gateway"-"DDB and EISA and PCMCIA"
on a Pentium 60 with 48Meg of memory and a WD8013 network card.

The most common cause seems to be when I'm running my noc
monitoring software.  It crashes after the 4 pings (which succeed) but
once I start to probe the services I'm watching, it randomly crashes (maybe
once a week).  Since I'm not running with DDB, it just boots back up
like it's supposed to, and has never generated a crash dump.

I also received the MCLPOOL warning once (while I was running GENERIC)
but all it did that time was lock the computer solid.  As long as I
don't run out of NMBCLUSTERS, I don't see that error and the system
doesn't wedge.

> 
> I'd love to reproduce and fix it, because it's a serious bug that
> has been plagueing people using 1.4 or later a lot.

The fact that it seems to most commonly happen to me right after the ping
might be a clue, or it might be a red herring...

Note that I have 9 other servers, and all of them are running the same
basic kernel.  None of the others have this problem.  Two are name
servers and one is an NFS host.  All of them are managed using the
nocmonitor software, so that (alone) isn't it.  Two of them are running
Apache, so that isn't (probably) it.  

mail2 101> ruptime -a
admin         up 45+02:25,     1 user,   load 0.24, 0.27, 0.24
fax2mail      up 46+22:48,     0 users,  load 0.10, 0.09, 0.08
mail          up  1+02:36,     6 users,  load 0.41, 0.40, 0.40
mail2         up     6:12,     2 users,  load 1.25, 1.15, 1.16
ns1           up 16+18:50,     1 user,   load 0.12, 0.09, 0.08
quakeII       up 45+02:56,     1 user,   load 2.11, 2.09, 2.08
radius1       up 33+12:49,     0 users,  load 0.33, 0.44, 0.34
radius2       up 62+03:23,     1 user,   load 0.06, 0.19, 0.21
webserv01     up 33+12:41,     0 users,  load 0.06, 0.07, 0.07
webserv02     up  8+09:49,     1 user,   load 0.11, 0.08, 0.08

If you have any guesses, I'll be glad to discuss them in private E-Mail.

-- 
Dave Burgess                   Network Engineer - Nebraska On-Ramp, Inc.
*bsd FAQ Maintainer / SysAdmin for the NetBSD system in my spare bedroom
"Just because something is stupid doesn't mean there isn't someone that 
doesn't want to do it...."