Subject: Re: System slowly dying under heavy UDP load.
To: None <liman@autonomica.se>
From: David Burgess <burgess@neonramp.com>
List: port-i386
Date: 06/25/2002 15:55:48
> itojun@iijlab.net:
>> 	what does vmstat -m and/or netstat -m say?
>
> How could I tell? I can't even log in! ;-)
>
> The problem is that the server is located in a very remote facility,
> and going there is a major pain and nothing is problematic until you
> can't login anymore (sic!). I'll try to re-route it to a computer room
> in my galaxy and see what I can get out of it.

If it's any consolation, I run a DNS with about 1000 domains (which
appears to be a order of magnitude smaller than yours), and the only times
I've ever seen this was when my network card locked itself in 10 Mbps mode
and when I was having a hard disk problem.  The queue just keeps getting
deeper, and the remotes are no less voracious as time goes by.
For remote testing, there are a couple of things that you could try.  An
obvious one would be to set up netsaint and watch the vmstat and netstat
numbers remotely over time.  You could set up a printer in your remote
facility and output redirect a vmstat/netstat job to the raw printer
device.  Old Epson's work best for this (a trick I learned with my IDS). 
You could also have the system mail you the current statistics every
couple of minutes.  You could nfs mount a drive on your local machine and
have vmstat/netstat concatenate their current numbers onto the end of a
file somewhere.  You could even do that with a file on the local harddrive
and read the file when the system comes back up.
I saw a booth at Networld+Interop for a company that sells remote power
control systems (Lighthouse?) so that you can remotely reboot the system
whenever you need to, thereby clearing the problem.  For that matter, you
could have the system automatically reboot every couple of hours on the
hour through cron....  You could also set up your own 'ping-o-death'
service on the machine.  Ping to a specific set of ICMP message types in
just the right order, and the server reboots; think of it as a combination
lock for the power switch.
Here's some of my system information, for comparison.
$ uname -a
NetBSD ns1.neonramp.com 1.5 NetBSD 1.5 (NEONRAMP-RADIUS) #0: Sat Feb  3
07:42:20 CST 2001
$uptime
3:44PM  up 149 days, 20:41, 2 users, load averages: 0.13, 0.12, 0.08

$vmstat -m
Memory resource pool statistics
Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
Maxpg Idlephpool        40      115    0        0     2     0     2     2     0  
inf    0pmappl        68   513732    0   513695     3     0     3     3     0  
inf    2vmsppl       188   513732    0   513695     6     0     6     6     0  
inf    3vmmpepl       64 12313756    0 12313152    26     0    26    26     0  
inf   14uaoeltpl      84        0    0        0     0     0     0     0     0  
inf    0aobjpl        52        0    0        0     0     0     0     0     0  
inf    0amappl        40  4700508    0  4700183     9     0     9     9     0  
inf    4mbpl         2561273704228    01273704211    37     0    37    37     1  
inf   35mclpl       2048 38470851    0 38470850    29     0    29    29     4  
128   28sockpl       164  1634741    0  1634602    15     0    15    15     0  
inf    7ttypl        284       70    0        0     5     0     5     5     0  
inf    0rndsample    528   135441    0   135437     1     0     1     1     0  
inf    0procpl       404   514157    0   514111    13     0    13    13     0  
inf    7pgrppl        24    47730    0    47701     1     0     1     1     0  
inf    0pcredpl       24   514157    0   514111     1     0     1     1     0  
inf    0plimitpl     156     4101    0     4094     1     0     1     1     0  
inf    0rusgepl       72   514111    0   514111     1     0     1     1     0  
inf    1filepl        48  8223602    0  8223400     5     0     5     5     0  
inf    1cwdipl        12   514148    0   514111     1     0     1     1     0  
inf    0fdescpl      124   514148    0   514111     4     0     4     4     0  
inf    2vnodepl      208     9466    0        0   499     0   499   499     0  
inf    0ncachepl      72     9466    0        0   170     0   170   170     0  
inf    0ffsinopl     216  1501848    0  1492419   525     0   525   525     0  
inf    0ext2fsinopl  216        0    0        0     0     0     0     0     0  
inf    0lfsinopl     216        0    0        0     0     0     0     0     0  
inf    0nfsnodepl    204    75861    0    75843   491     0   491   491     0  
inf  483nfsvapl      100    75861    0    75843   233     0   233   233     0  
inf  227cd9660nopl   108        0    0        0     0     0     0     0     0  
inf    0msdosnopl    100        0    0        0     0     0     0     0     0  
inf    0wdcspl        48  8280901    0  8280901     1     0     1     1     0  
inf    1extent        20      262    0      245     1     0     1     1     0  
inf    0scxspl       148        0    0        0     0     0     0     0     0  
inf    0bufpl        124   589896    0   589896     1     0     1     1     0  
inf    1ccdpl        140        0    0        0     0     0     0     0     0  
inf    0rtentpl      128   957432    0   957161    13     0    13    13     0  
inf    3inpcbpl       96   876257    0   876219     4     0     4     4     0  
inf    2rttmrpl       32        0    0        0     0     0     0     0     0  
inf    0ipqepl        40   123400    0   123400     1     0     1     1     0  
inf    1tcpcbpl      176   705912    0   705884     7     0     7     7     0  
inf    5synpl        168   608609    0   608609     1     0     1     1     0  
inf    1sigapl       840   514148    0   514111    30     0    30    30     0  
inf   19swp buf      152        0    0        0     0     0     0     0     0  
inf    0swp vnx       20        0    0        0     0     0     0     0     0  
inf    0swp vnd      128        0    0        0     0     0     0     0     0  
inf    0
In use 4805K, total allocated 8548K; utilization 56.2%

$ netstat -m
8 mbufs in use:
        2 mbufs allocated to data
        4 mbufs allocated to packet headers
        2 mbufs allocated to socket names and addresses
1/58 mapped pages in use
264 Kbytes allocated to network (1% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

I'll upgrade this machine to 1.6Beta 2 soon and see if it makes a difference.

Finally, I remember something about some of the later named8 versions
being memory hogs.  I'm running named9, which solved some of those
problems and might help.
-- 
Dave Burgess
CTO, Nebraska On-Ramp
Chief Engineer, Mitec Internet Services
Bellevue, NE 68123