Subject: Re: System slowly dying under heavy UDP load.
To: None <liman@autonomica.se>
From: David Burgess <burgess@neonramp.com>
List: current-users
Date: 06/25/2002 15:55:48
> itojun@iijlab.net:
>> what does vmstat -m and/or netstat -m say?
>
> How could I tell? I can't even log in! ;-)
>
> The problem is that the server is located in a very remote facility,
> and going there is a major pain and nothing is problematic until you
> can't login anymore (sic!). I'll try to re-route it to a computer room
> in my galaxy and see what I can get out of it.
If it's any consolation, I run a DNS with about 1000 domains (which
appears to be a order of magnitude smaller than yours), and the only times
I've ever seen this was when my network card locked itself in 10 Mbps mode
and when I was having a hard disk problem. The queue just keeps getting
deeper, and the remotes are no less voracious as time goes by.
For remote testing, there are a couple of things that you could try. An
obvious one would be to set up netsaint and watch the vmstat and netstat
numbers remotely over time. You could set up a printer in your remote
facility and output redirect a vmstat/netstat job to the raw printer
device. Old Epson's work best for this (a trick I learned with my IDS).
You could also have the system mail you the current statistics every
couple of minutes. You could nfs mount a drive on your local machine and
have vmstat/netstat concatenate their current numbers onto the end of a
file somewhere. You could even do that with a file on the local harddrive
and read the file when the system comes back up.
I saw a booth at Networld+Interop for a company that sells remote power
control systems (Lighthouse?) so that you can remotely reboot the system
whenever you need to, thereby clearing the problem. For that matter, you
could have the system automatically reboot every couple of hours on the
hour through cron.... You could also set up your own 'ping-o-death'
service on the machine. Ping to a specific set of ICMP message types in
just the right order, and the server reboots; think of it as a combination
lock for the power switch.
Here's some of my system information, for comparison.
$ uname -a
NetBSD ns1.neonramp.com 1.5 NetBSD 1.5 (NEONRAMP-RADIUS) #0: Sat Feb 3
07:42:20 CST 2001
$uptime
3:44PM up 149 days, 20:41, 2 users, load averages: 0.13, 0.12, 0.08
$vmstat -m
Memory resource pool statistics
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
Maxpg Idlephpool 40 115 0 0 2 0 2 2 0
inf 0pmappl 68 513732 0 513695 3 0 3 3 0
inf 2vmsppl 188 513732 0 513695 6 0 6 6 0
inf 3vmmpepl 64 12313756 0 12313152 26 0 26 26 0
inf 14uaoeltpl 84 0 0 0 0 0 0 0 0
inf 0aobjpl 52 0 0 0 0 0 0 0 0
inf 0amappl 40 4700508 0 4700183 9 0 9 9 0
inf 4mbpl 2561273704228 01273704211 37 0 37 37 1
inf 35mclpl 2048 38470851 0 38470850 29 0 29 29 4
128 28sockpl 164 1634741 0 1634602 15 0 15 15 0
inf 7ttypl 284 70 0 0 5 0 5 5 0
inf 0rndsample 528 135441 0 135437 1 0 1 1 0
inf 0procpl 404 514157 0 514111 13 0 13 13 0
inf 7pgrppl 24 47730 0 47701 1 0 1 1 0
inf 0pcredpl 24 514157 0 514111 1 0 1 1 0
inf 0plimitpl 156 4101 0 4094 1 0 1 1 0
inf 0rusgepl 72 514111 0 514111 1 0 1 1 0
inf 1filepl 48 8223602 0 8223400 5 0 5 5 0
inf 1cwdipl 12 514148 0 514111 1 0 1 1 0
inf 0fdescpl 124 514148 0 514111 4 0 4 4 0
inf 2vnodepl 208 9466 0 0 499 0 499 499 0
inf 0ncachepl 72 9466 0 0 170 0 170 170 0
inf 0ffsinopl 216 1501848 0 1492419 525 0 525 525 0
inf 0ext2fsinopl 216 0 0 0 0 0 0 0 0
inf 0lfsinopl 216 0 0 0 0 0 0 0 0
inf 0nfsnodepl 204 75861 0 75843 491 0 491 491 0
inf 483nfsvapl 100 75861 0 75843 233 0 233 233 0
inf 227cd9660nopl 108 0 0 0 0 0 0 0 0
inf 0msdosnopl 100 0 0 0 0 0 0 0 0
inf 0wdcspl 48 8280901 0 8280901 1 0 1 1 0
inf 1extent 20 262 0 245 1 0 1 1 0
inf 0scxspl 148 0 0 0 0 0 0 0 0
inf 0bufpl 124 589896 0 589896 1 0 1 1 0
inf 1ccdpl 140 0 0 0 0 0 0 0 0
inf 0rtentpl 128 957432 0 957161 13 0 13 13 0
inf 3inpcbpl 96 876257 0 876219 4 0 4 4 0
inf 2rttmrpl 32 0 0 0 0 0 0 0 0
inf 0ipqepl 40 123400 0 123400 1 0 1 1 0
inf 1tcpcbpl 176 705912 0 705884 7 0 7 7 0
inf 5synpl 168 608609 0 608609 1 0 1 1 0
inf 1sigapl 840 514148 0 514111 30 0 30 30 0
inf 19swp buf 152 0 0 0 0 0 0 0 0
inf 0swp vnx 20 0 0 0 0 0 0 0 0
inf 0swp vnd 128 0 0 0 0 0 0 0 0
inf 0
In use 4805K, total allocated 8548K; utilization 56.2%
$ netstat -m
8 mbufs in use:
2 mbufs allocated to data
4 mbufs allocated to packet headers
2 mbufs allocated to socket names and addresses
1/58 mapped pages in use
264 Kbytes allocated to network (1% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
I'll upgrade this machine to 1.6Beta 2 soon and see if it makes a difference.
Finally, I remember something about some of the later named8 versions
being memory hogs. I'm running named9, which solved some of those
problems and might help.
--
Dave Burgess
CTO, Nebraska On-Ramp
Chief Engineer, Mitec Internet Services
Bellevue, NE 68123