port-i386: Re: About NetBSD server tuning!

Subject: Re: About NetBSD server tuning!
To: =?iso-8859-1?Q?Jarom=EDr_Dolecek?= <jdolecek@netbsd.org>
From: None <sudog@sudog.com>
List: port-i386
Date: 02/21/2001 11:53:34
> Are there any syslog entries like "proc: table is full - increase
> kern.maxproc or NPROC" ? Couple of subsystems report when they
> reach the limit of resources and hint what should be raised.
>
> You probably want to bump MAXUSERS to 64, too.
>
> Jaromir

There were, once upon a time, these problems. Currently I've got the
MAXPROC set pretty high to accommodate the vast hordes of people who
are nailing the sites. For instance, I've got

kern.maxproc = 4096
kern.maxfiles = 8196
kern.nmbclusters = 8192
vm.nkmempages: 16280

Now this machine in particular has only 256M ram.. but perhaps x1.5
that in swap.

What I've noticed is that there are three thresholds that I'm
currently trying to "feel" out.

1. The threshold of maximum processes the machine is sysctl'd to.

2. The maximum number of apache processes/cgi processes, controlled by
shell ulimits, that the server software is currently limited to.

3. The actual number of apache processes that apache limits itself to.

I've noticed some patterns.

#2 must not exceed #1-x% where x is some small single-digit number. If
it exceeds #1-x%, things choke and I need to reboot the machine. (This
load is driven mostly by users clicking and clicking and hitting
refresh etc.)

If #3 (and corresponding perl cgi children) exceeds #2, apache stalls
and is no longer capable of handling cgi. It must be SIGHUP'd to be
corrected.

It's a juggling act once the kernel is tuned to be capable of dealing
with the huge limits and doesn't panic any more under the loads.

Some guidelines I've been forced into finding out for myself:

1. RAIDFrame is a no-no on the heavily accessed drives. The huge
volume of small reads and tiny writes that happen can only be handled
on a normal ffs-based partition.  Therefore in order to ensure minimal
data loss, regular backups must be made.

2. With RAM at a premium, another balance must be achieved: That of
file system caching to actual RAM. I could increase the amount of
memory dedicated to file system caching; but this needs to be
considered in conjunction with how many users are connected at once.
Too much, and there's too much wastage and machinery isn't living up
to its potential. Too little and the users can starve the system.

3. SCSI, SCSI, SCSI. Anything less and you're spinning your wheels.
Preferrably Seagate's new 15k Cheetah drives. Now those, are nice.
Quantum Atlas V 10k are nice too but hella loud. Sounds like a jet
fighter throttling up!

------------***

I really need to figure out some specific equations to decide where to
set these limits..  do some testing or profiling to find out how long
on average each perl child takes, how much it accesses the disk and in
what patterns, and then how much the system can actually handle of
this..  how much users use the CGI as opposed to normal HTML links as
a ratio.. and work this into the daily traffic.

This is so complicated, I was hoping another user here could offer
some common server configurations based on available RAM and disk type
(IDE vs SCSI) that they've been able to work with and not have to
worry about things getting out of control.

Thank you for your time,

Sincerely,

Marc Tooley