Subject: Re: Excessive swapping / Memory problems
To: Chuck Swiger <cswiger@mac.com>
From: Vincent van Scherpenseel <mailinglists@vanscherpenseel.nl>
List: netbsd-users
Date: 09/06/2006 21:40:55
Chuck Swiger wrote:
>> Since last week I'm having problems with processes getting Killed out
>> of nowhere on my NetBSD 1.6.2 machine. I searched the
>> /var/log/messages log and learnt that this is due to the machine
>> running out of swap.
>
> "top -o size" will show you which processes are requiring lots of
> memory; it's possible that one of them is leaking, or it may simply be
> that you are trying to run too much on a machine with limited RAM.
> Increasing the amount of RAM in your machine is probably going to
> improve the performance of the system significantly...
The strange thing is that it's a very low loaded machine, only running
an SMTP server for my daily mail (about 50 messages a day) and a
webserver for my personal website (http://vincent.vanscherpenseel.nl).
Most of the visitors are spambots targeting the blog comment system (I
really need to implement a CAPTCHA check, I know).
Here is the top output of top -o size:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
9747 www -4 0 35M 4K semwait 0:32 0.00% 0.00% httpd
9700 www -4 0 35M 4K semwait 0:31 0.00% 0.00% httpd
11517 www -4 0 34M 4K semwait 0:10 0.00% 0.00% httpd
12290 www -4 0 33M 4K semwait 0:37 0.00% 0.00% httpd
363 mysql 2 4 33M 2884K select 25:29 0.00% 0.00% mysqld
11943 www -4 0 32M 4K semwait 0:35 0.00% 0.00% httpd
27633 www -4 0 28M 4K semwait 0:01 0.00% 0.00% httpd
9697 www -4 0 27M 4K semwait 0:20 0.00% 0.00% httpd
11518 www -4 0 24M 2204K semwait 0:07 0.00% 0.00% httpd
27634 www 2 0 3104K 280K select 0:00 0.00% 0.00% httpd
27635 www -4 0 3080K 2572K semwait 0:00 0.00% 0.00% httpd
355 root 2 0 2704K 356K select 0:12 0.00% 0.00% httpd
When I lsof the PIDs I see lots of calls to the disk, for example:
httpd 9747 www 7w VREG 0,4 2522038 100509 /var
(/dev/wd0e)
httpd 9747 www 8w VREG 0,4 1275830 141107 /var
(/dev/wd0e)
httpd 9747 www 9w VREG 0,4 13516788 120627 /var
(/dev/wd0e)
httpd 9747 www 10w VREG 0,5 5679147 189052 /usr
(/dev/wd0f)
httpd 9747 www 11w VREG 0,4 1209919 120602 /var
(/dev/wd0e)
httpd 9747 www 12w VREG 0,4 134602 130645 /var
(/dev/wd0e)
httpd 9747 www 13w VREG 0,4 828934 120603 /var
(/dev/wd0e)
httpd 9747 www 14w VREG 0,4 35181 110546 /var
(/dev/wd0e)
httpd 9747 www 15w VREG 0,4 893 110543 /var
(/dev/wd0e)
httpd 9747 www 16w VREG 0,4 20214 100493 /var
(/dev/wd0e)
httpd 9747 www 17w VREG 0,4 62516 120619 /var
(/dev/wd0e)
httpd 9747 www 18w VREG 0,4 0 100528 /var
(/dev/wd0e)
httpd 9747 www 19w VREG 0,4 17391685 100560 /var
(/dev/wd0e)
httpd 9747 www 20w VREG 0,4 26591 100503 /var
(/dev/wd0e)
httpd 9747 www 21w VREG 0,4 1016 110539 /var
(/dev/wd0e)
httpd 9747 www 22w VREG 0,4 10444 100539 /var
(/dev/wd0e)
(There are a lot more calls for PID 9747, didn't want to copy&paste them
all)
dmesgs says this regarding my harddisk drives:
wd0 at pciide0 channel 0 drive 0: <MAXTOR 6L020J1>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 19595 MB, 39813 cyl, 16 head, 63 sec, 512 bytes/sect x 40132503 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1 at pciide0 channel 0 drive 1: <Maxtor 32049H2>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 19541 MB, 39704 cyl, 16 head, 63 sec, 512 bytes/sect x 40021632 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
DMA data transfers)
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
DMA data transfers)
boot device: wd0
root on wd0a dumps on wd0b
wd0: transfer error, downgrading to Ultra-DMA mode 1
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA data
transfers)
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
DMA data transfers)
wd0a: DMA error reading fsbn 3468096 of 3468096-3468111 (wd0 bn 3468159;
cn 3440 tn 10 sn 9), retrying
wd0: soft error (corrected)
Could it be that my harddisk is somehow faulty and the processes hang on
the calls to the disk turning them into memory consuming zombies?
Thank you all for the help,
Vincent van Scherpenseel