Subject: Re: Re: Excessive swapping / Memory problems
To: Vincent van Scherpenseel <mailinglists@vanscherpenseel.nl>
From: Erik Berls <cyber@ono-sendai.com>
List: netbsd-users
Date: 09/06/2006 12:54:28
This really looks like it might be a periodic process that comes
along, consumes a large amount of memory, then exits.

Note the amount of processes that only have 4k resident (likely the
pcb).  Do you see the process kills at a particular time of day?
Could it be something caused by cron?  mysqld forking off and that
fork consuming large amounts of memory?

If you've got a low volume web site, you might reconsider how many
httpd's you are running.  You might be able to drop from 10 down to 4.

-=erik.


On 9/6/06, Vincent van Scherpenseel <mailinglists@vanscherpenseel.nl> wrote:
>
> Chuck Swiger wrote:
>
> >> Since last week I'm having problems with processes getting Killed out
> >> of nowhere on my NetBSD 1.6.2 machine. I searched the
> >> /var/log/messages log and learnt that this is due to the machine
> >> running out of swap.
> >
> > "top -o size" will show you which processes are requiring lots of
> > memory; it's possible that one of them is leaking, or it may simply be
> > that you are trying to run too much on a machine with limited RAM.
> > Increasing the amount of RAM in your machine is probably going to
> > improve the performance of the system significantly...
>
> The strange thing is that it's a very low loaded machine, only running
> an SMTP server for my daily mail (about 50 messages a day) and a
> webserver for my personal website (http://vincent.vanscherpenseel.nl).
> Most of the visitors are spambots targeting the blog comment system (I
> really need to implement a CAPTCHA check, I know).
>
>
>
> Here is the top output of top -o size:
>
>    PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
>   9747 www       -4    0    35M    4K semwait    0:32  0.00%  0.00% httpd
>   9700 www       -4    0    35M    4K semwait    0:31  0.00%  0.00% httpd
> 11517 www       -4    0    34M    4K semwait    0:10  0.00%  0.00% httpd
> 12290 www       -4    0    33M    4K semwait    0:37  0.00%  0.00% httpd
>    363 mysql      2    4    33M 2884K select    25:29  0.00%  0.00% mysqld
> 11943 www       -4    0    32M    4K semwait    0:35  0.00%  0.00% httpd
> 27633 www       -4    0    28M    4K semwait    0:01  0.00%  0.00% httpd
>   9697 www       -4    0    27M    4K semwait    0:20  0.00%  0.00% httpd
> 11518 www       -4    0    24M 2204K semwait    0:07  0.00%  0.00% httpd
> 27634 www        2    0  3104K  280K select     0:00  0.00%  0.00% httpd
> 27635 www       -4    0  3080K 2572K semwait    0:00  0.00%  0.00% httpd
>    355 root       2    0  2704K  356K select     0:12  0.00%  0.00% httpd
>
>
>
> When I lsof the PIDs I see lots of calls to the disk, for example:
>
> httpd      9747      www    7w  VREG        0,4   2522038 100509 /var
> (/dev/wd0e)
> httpd      9747      www    8w  VREG        0,4   1275830 141107 /var
> (/dev/wd0e)
> httpd      9747      www    9w  VREG        0,4  13516788 120627 /var
> (/dev/wd0e)
> httpd      9747      www   10w  VREG        0,5   5679147 189052 /usr
> (/dev/wd0f)
> httpd      9747      www   11w  VREG        0,4   1209919 120602 /var
> (/dev/wd0e)
> httpd      9747      www   12w  VREG        0,4    134602 130645 /var
> (/dev/wd0e)
> httpd      9747      www   13w  VREG        0,4    828934 120603 /var
> (/dev/wd0e)
> httpd      9747      www   14w  VREG        0,4     35181 110546 /var
> (/dev/wd0e)
> httpd      9747      www   15w  VREG        0,4       893 110543 /var
> (/dev/wd0e)
> httpd      9747      www   16w  VREG        0,4     20214 100493 /var
> (/dev/wd0e)
> httpd      9747      www   17w  VREG        0,4     62516 120619 /var
> (/dev/wd0e)
> httpd      9747      www   18w  VREG        0,4         0 100528 /var
> (/dev/wd0e)
> httpd      9747      www   19w  VREG        0,4  17391685 100560 /var
> (/dev/wd0e)
> httpd      9747      www   20w  VREG        0,4     26591 100503 /var
> (/dev/wd0e)
> httpd      9747      www   21w  VREG        0,4      1016 110539 /var
> (/dev/wd0e)
> httpd      9747      www   22w  VREG        0,4     10444 100539 /var
> (/dev/wd0e)
> (There are a lot more calls for PID 9747, didn't want to copy&paste them
> all)
>
>
>
> dmesgs says this regarding my harddisk drives:
>
> wd0 at pciide0 channel 0 drive 0: <MAXTOR 6L020J1>
> wd0: drive supports 16-sector PIO transfers, LBA addressing
> wd0: 19595 MB, 39813 cyl, 16 head, 63 sec, 512 bytes/sect x 40132503 sectors
> wd0: 32-bit data port
> wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
> wd1 at pciide0 channel 0 drive 1: <Maxtor 32049H2>
> wd1: drive supports 16-sector PIO transfers, LBA addressing
> wd1: 19541 MB, 39704 cyl, 16 head, 63 sec, 512 bytes/sect x 40021632 sectors
> wd1: 32-bit data port
> wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
> DMA data transfers)
> wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
> DMA data transfers)
> boot device: wd0
> root on wd0a dumps on wd0b
> wd0: transfer error, downgrading to Ultra-DMA mode 1
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA data
> transfers)
> wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
> DMA data transfers)
> wd0a: DMA error reading fsbn 3468096 of 3468096-3468111 (wd0 bn 3468159;
> cn 3440 tn 10 sn 9), retrying
> wd0: soft error (corrected)
>
>
>
> Could it be that my harddisk is somehow faulty and the processes hang on
> the calls to the disk turning them into memory consuming zombies?
>
> Thank you all for the help,
> Vincent van Scherpenseel
>