NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Finding bottlenecks on a proxy server



I'm using squid and danguardian as a content-filtering web-proxy combo.

These are arranged in the chain:
clients -> squid (1) -> dansguardian -> squid (2) -> Internet

The first squid provides the most flexible filtering (time of day, MAC address, user authentication, client IP/port, etc.). It then speaks to dansguardian as an upstream proxy. dansguardian then speaks to second squid instance to do the actual fetching (some requests bypass dansguardian on the basis of certain access rules).

squid (1) run as user squid and runs as a single process. Its process limits are:

proc.868.rlimit.cputime.soft = unlimited
proc.868.rlimit.cputime.hard = unlimited
proc.868.rlimit.filesize.soft = unlimited
proc.868.rlimit.filesize.hard = unlimited
proc.868.rlimit.datasize.soft = 8589934592
proc.868.rlimit.datasize.hard = 8589934592
proc.868.rlimit.stacksize.soft = 4194304
proc.868.rlimit.stacksize.hard = 134217728
proc.868.rlimit.coredumpsize.soft = unlimited
proc.868.rlimit.coredumpsize.hard = unlimited
proc.868.rlimit.memoryuse.soft = 6220451840
proc.868.rlimit.memoryuse.hard = 6220451840
proc.868.rlimit.memorylocked.soft = 2073483946
proc.868.rlimit.memorylocked.hard = 6220451840
proc.868.rlimit.maxproc.soft = 1024
proc.868.rlimit.maxproc.hard = 2068
proc.868.rlimit.descriptors.soft = 24576
proc.868.rlimit.descriptors.hard = 24576
proc.868.rlimit.sbsize.soft = unlimited
proc.868.rlimit.sbsize.hard = unlimited
proc.868.rlimit.vmemoryuse.soft = unlimited
proc.868.rlimit.vmemoryuse.hard = unlimited
proc.868.rlimit.maxlwp.soft = 1024
proc.868.rlimit.maxlwp.hard = 2048

dansguardian run as user dangrdn and runs as a traditional forking parent/child pool. It listens on port 8124. There is a limit of 250 child processes. The parent process has the following limits:

proc.1231.rlimit.cputime.soft = unlimited
proc.1231.rlimit.cputime.hard = unlimited
proc.1231.rlimit.filesize.soft = unlimited
proc.1231.rlimit.filesize.hard = unlimited
proc.1231.rlimit.datasize.soft = 268435456
proc.1231.rlimit.datasize.hard = 8589934592
proc.1231.rlimit.stacksize.soft = 4194304
proc.1231.rlimit.stacksize.hard = 134217728
proc.1231.rlimit.coredumpsize.soft = unlimited
proc.1231.rlimit.coredumpsize.hard = unlimited
proc.1231.rlimit.memoryuse.soft = 6220451840
proc.1231.rlimit.memoryuse.hard = 6220451840
proc.1231.rlimit.memorylocked.soft = 2073483946
proc.1231.rlimit.memorylocked.hard = 6220451840
proc.1231.rlimit.maxproc.soft = 320
proc.1231.rlimit.maxproc.hard = 320
proc.1231.rlimit.descriptors.soft = 320
proc.1231.rlimit.descriptors.hard = 320
proc.1231.rlimit.sbsize.soft = unlimited
proc.1231.rlimit.sbsize.hard = unlimited
proc.1231.rlimit.vmemoryuse.soft = unlimited
proc.1231.rlimit.vmemoryuse.hard = unlimited
proc.1231.rlimit.maxlwp.soft = 1024
proc.1231.rlimit.maxlwp.hard = 2048

The parent process has a unix socket connection to each child and a few extra descriptors for log, files, etc. It appears to use 6 more descriptors than current children, so the limits above should be ample. Each child uses 10 (if dormant) or 12 (if active) descriptors and they inherit the above limits.

The second squid process runs as user nobody and is (currently) configured to do no caching or access logging. I'll refer to it as squidnc (nc = no cache). It listens on port 8123. It has the same process limits as the first squid process.

At busy times, the dansguardian processes stack up and hit the 250 limit. Web access then slows to a crawl as requests queue waiting for a free child. At that point users start shouting and time to investigate is short.

Both squid processes have a similar amount of file descriptors open. The main squid is around 40 higher reflecting the fact that it is manipulating its cache and there are a few helpers running. CPU usage is low. top shows the squids are in kqueue state. Inbound bandwidth is not limiting this.

squidnc does not log anything. danguardian logs:

dansguardian[11094]: Error 9 (Bad file descriptor) connecting to proxy 127.0.0.1:8123 by client 10.4.4.2

My theory is that squidnc is the bottleneck (even though it is doing the least work), however I do not have any hard evidence of this. I am looking for help on finding and fixing any such bottlenecks or, if I'm looking in entirely the wrong place, suggestions of better places to look.

All on NetBSD 7.2_STABLE amd64 with adequate RAM.

--
Stephen



Home | Main Index | Thread Index | Old Index