Subject: Re: Re: Excessive swapping / Memory problems
To: Vincent van Scherpenseel <mailinglists@vanscherpenseel.nl>
From: Erik Berls <cyber@ono-sendai.com>
List: netbsd-users
Date: 09/06/2006 12:54:28
This really looks like it might be a periodic process that comes
along, consumes a large amount of memory, then exits.
Note the amount of processes that only have 4k resident (likely the
pcb). Do you see the process kills at a particular time of day?
Could it be something caused by cron? mysqld forking off and that
fork consuming large amounts of memory?
If you've got a low volume web site, you might reconsider how many
httpd's you are running. You might be able to drop from 10 down to 4.
-=erik.
On 9/6/06, Vincent van Scherpenseel <mailinglists@vanscherpenseel.nl> wrote:
>
> Chuck Swiger wrote:
>
> >> Since last week I'm having problems with processes getting Killed out
> >> of nowhere on my NetBSD 1.6.2 machine. I searched the
> >> /var/log/messages log and learnt that this is due to the machine
> >> running out of swap.
> >
> > "top -o size" will show you which processes are requiring lots of
> > memory; it's possible that one of them is leaking, or it may simply be
> > that you are trying to run too much on a machine with limited RAM.
> > Increasing the amount of RAM in your machine is probably going to
> > improve the performance of the system significantly...
>
> The strange thing is that it's a very low loaded machine, only running
> an SMTP server for my daily mail (about 50 messages a day) and a
> webserver for my personal website (http://vincent.vanscherpenseel.nl).
> Most of the visitors are spambots targeting the blog comment system (I
> really need to implement a CAPTCHA check, I know).
>
>
>
> Here is the top output of top -o size:
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 9747 www -4 0 35M 4K semwait 0:32 0.00% 0.00% httpd
> 9700 www -4 0 35M 4K semwait 0:31 0.00% 0.00% httpd
> 11517 www -4 0 34M 4K semwait 0:10 0.00% 0.00% httpd
> 12290 www -4 0 33M 4K semwait 0:37 0.00% 0.00% httpd
> 363 mysql 2 4 33M 2884K select 25:29 0.00% 0.00% mysqld
> 11943 www -4 0 32M 4K semwait 0:35 0.00% 0.00% httpd
> 27633 www -4 0 28M 4K semwait 0:01 0.00% 0.00% httpd
> 9697 www -4 0 27M 4K semwait 0:20 0.00% 0.00% httpd
> 11518 www -4 0 24M 2204K semwait 0:07 0.00% 0.00% httpd
> 27634 www 2 0 3104K 280K select 0:00 0.00% 0.00% httpd
> 27635 www -4 0 3080K 2572K semwait 0:00 0.00% 0.00% httpd
> 355 root 2 0 2704K 356K select 0:12 0.00% 0.00% httpd
>
>
>
> When I lsof the PIDs I see lots of calls to the disk, for example:
>
> httpd 9747 www 7w VREG 0,4 2522038 100509 /var
> (/dev/wd0e)
> httpd 9747 www 8w VREG 0,4 1275830 141107 /var
> (/dev/wd0e)
> httpd 9747 www 9w VREG 0,4 13516788 120627 /var
> (/dev/wd0e)
> httpd 9747 www 10w VREG 0,5 5679147 189052 /usr
> (/dev/wd0f)
> httpd 9747 www 11w VREG 0,4 1209919 120602 /var
> (/dev/wd0e)
> httpd 9747 www 12w VREG 0,4 134602 130645 /var
> (/dev/wd0e)
> httpd 9747 www 13w VREG 0,4 828934 120603 /var
> (/dev/wd0e)
> httpd 9747 www 14w VREG 0,4 35181 110546 /var
> (/dev/wd0e)
> httpd 9747 www 15w VREG 0,4 893 110543 /var
> (/dev/wd0e)
> httpd 9747 www 16w VREG 0,4 20214 100493 /var
> (/dev/wd0e)
> httpd 9747 www 17w VREG 0,4 62516 120619 /var
> (/dev/wd0e)
> httpd 9747 www 18w VREG 0,4 0 100528 /var
> (/dev/wd0e)
> httpd 9747 www 19w VREG 0,4 17391685 100560 /var
> (/dev/wd0e)
> httpd 9747 www 20w VREG 0,4 26591 100503 /var
> (/dev/wd0e)
> httpd 9747 www 21w VREG 0,4 1016 110539 /var
> (/dev/wd0e)
> httpd 9747 www 22w VREG 0,4 10444 100539 /var
> (/dev/wd0e)
> (There are a lot more calls for PID 9747, didn't want to copy&paste them
> all)
>
>
>
> dmesgs says this regarding my harddisk drives:
>
> wd0 at pciide0 channel 0 drive 0: <MAXTOR 6L020J1>
> wd0: drive supports 16-sector PIO transfers, LBA addressing
> wd0: 19595 MB, 39813 cyl, 16 head, 63 sec, 512 bytes/sect x 40132503 sectors
> wd0: 32-bit data port
> wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
> wd1 at pciide0 channel 0 drive 1: <Maxtor 32049H2>
> wd1: drive supports 16-sector PIO transfers, LBA addressing
> wd1: 19541 MB, 39704 cyl, 16 head, 63 sec, 512 bytes/sect x 40021632 sectors
> wd1: 32-bit data port
> wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
> DMA data transfers)
> wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
> DMA data transfers)
> boot device: wd0
> root on wd0a dumps on wd0b
> wd0: transfer error, downgrading to Ultra-DMA mode 1
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA data
> transfers)
> wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
> DMA data transfers)
> wd0a: DMA error reading fsbn 3468096 of 3468096-3468111 (wd0 bn 3468159;
> cn 3440 tn 10 sn 9), retrying
> wd0: soft error (corrected)
>
>
>
> Could it be that my harddisk is somehow faulty and the processes hang on
> the calls to the disk turning them into memory consuming zombies?
>
> Thank you all for the help,
> Vincent van Scherpenseel
>