Subject: Re: Excessive swapping / Memory problems
To: Chuck Swiger <cswiger@mac.com>
From: Vincent van Scherpenseel <mailinglists@vanscherpenseel.nl>
List: netbsd-users
Date: 09/06/2006 21:40:55
Chuck Swiger wrote:

>> Since last week I'm having problems with processes getting Killed out 
>> of nowhere on my NetBSD 1.6.2 machine. I searched the 
>> /var/log/messages log and learnt that this is due to the machine 
>> running out of swap.
> 
> "top -o size" will show you which processes are requiring lots of 
> memory; it's possible that one of them is leaking, or it may simply be 
> that you are trying to run too much on a machine with limited RAM.  
> Increasing the amount of RAM in your machine is probably going to 
> improve the performance of the system significantly...

The strange thing is that it's a very low loaded machine, only running 
an SMTP server for my daily mail (about 50 messages a day) and a 
webserver for my personal website (http://vincent.vanscherpenseel.nl). 
Most of the visitors are spambots targeting the blog comment system (I 
really need to implement a CAPTCHA check, I know).



Here is the top output of top -o size:

   PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
  9747 www       -4    0    35M    4K semwait    0:32  0.00%  0.00% httpd
  9700 www       -4    0    35M    4K semwait    0:31  0.00%  0.00% httpd
11517 www       -4    0    34M    4K semwait    0:10  0.00%  0.00% httpd
12290 www       -4    0    33M    4K semwait    0:37  0.00%  0.00% httpd
   363 mysql      2    4    33M 2884K select    25:29  0.00%  0.00% mysqld
11943 www       -4    0    32M    4K semwait    0:35  0.00%  0.00% httpd
27633 www       -4    0    28M    4K semwait    0:01  0.00%  0.00% httpd
  9697 www       -4    0    27M    4K semwait    0:20  0.00%  0.00% httpd
11518 www       -4    0    24M 2204K semwait    0:07  0.00%  0.00% httpd
27634 www        2    0  3104K  280K select     0:00  0.00%  0.00% httpd
27635 www       -4    0  3080K 2572K semwait    0:00  0.00%  0.00% httpd
   355 root       2    0  2704K  356K select     0:12  0.00%  0.00% httpd



When I lsof the PIDs I see lots of calls to the disk, for example:

httpd      9747      www    7w  VREG        0,4   2522038 100509 /var 
(/dev/wd0e)
httpd      9747      www    8w  VREG        0,4   1275830 141107 /var 
(/dev/wd0e)
httpd      9747      www    9w  VREG        0,4  13516788 120627 /var 
(/dev/wd0e)
httpd      9747      www   10w  VREG        0,5   5679147 189052 /usr 
(/dev/wd0f)
httpd      9747      www   11w  VREG        0,4   1209919 120602 /var 
(/dev/wd0e)
httpd      9747      www   12w  VREG        0,4    134602 130645 /var 
(/dev/wd0e)
httpd      9747      www   13w  VREG        0,4    828934 120603 /var 
(/dev/wd0e)
httpd      9747      www   14w  VREG        0,4     35181 110546 /var 
(/dev/wd0e)
httpd      9747      www   15w  VREG        0,4       893 110543 /var 
(/dev/wd0e)
httpd      9747      www   16w  VREG        0,4     20214 100493 /var 
(/dev/wd0e)
httpd      9747      www   17w  VREG        0,4     62516 120619 /var 
(/dev/wd0e)
httpd      9747      www   18w  VREG        0,4         0 100528 /var 
(/dev/wd0e)
httpd      9747      www   19w  VREG        0,4  17391685 100560 /var 
(/dev/wd0e)
httpd      9747      www   20w  VREG        0,4     26591 100503 /var 
(/dev/wd0e)
httpd      9747      www   21w  VREG        0,4      1016 110539 /var 
(/dev/wd0e)
httpd      9747      www   22w  VREG        0,4     10444 100539 /var 
(/dev/wd0e)
(There are a lot more calls for PID 9747, didn't want to copy&paste them 
all)



dmesgs says this regarding my harddisk drives:

wd0 at pciide0 channel 0 drive 0: <MAXTOR 6L020J1>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 19595 MB, 39813 cyl, 16 head, 63 sec, 512 bytes/sect x 40132503 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1 at pciide0 channel 0 drive 1: <Maxtor 32049H2>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 19541 MB, 39704 cyl, 16 head, 63 sec, 512 bytes/sect x 40021632 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using 
DMA data transfers)
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using 
DMA data transfers)
boot device: wd0
root on wd0a dumps on wd0b
wd0: transfer error, downgrading to Ultra-DMA mode 1
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA data 
transfers)
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using 
DMA data transfers)
wd0a: DMA error reading fsbn 3468096 of 3468096-3468111 (wd0 bn 3468159; 
cn 3440 tn 10 sn 9), retrying
wd0: soft error (corrected)



Could it be that my harddisk is somehow faulty and the processes hang on 
the calls to the disk turning them into memory consuming zombies?

Thank you all for the help,
Vincent van Scherpenseel