Subject: Why do some network connections get stuck forever?
To: NetBSD tech-net list <tech-net@NetBSD.org>
From: Greg A. Woods <woods@weird.com>
List: tech-net
Date: 10/24/2005 14:56:32
Why do some network connections get stuck forever?  Every time something
"bad" happens to the network for any extended period of time (eg. this
time I think it was the switch being rebooted and not coming back up for
quite a few minutes, or similar), some (and sometimes many) existing
(and always inbound) connections get stuck forever until the server
processes that hold them open are killed.

I've seen these happen for any and every kind of TCP service, including
standard NetBSD services, such as ftpd, telnetd, etc.  In this example
it's a pop3d process from Cyrus IMAPd.

This problem has been present on servers I've been responsible for since
way back in the NetBSD-1.3 days on every kind of platform I've run,
though now with 1.6.x when I kill the processes the whole system is not
likely to soon panic as a result (and indeed the processes do die and
the sockets do seem to get closed properly, though perhaps there are
still some small memory leaks).

This particular process has been hung for over a month, and as you'll
see below the client that started it isn't even on the net at the moment.


14:04 [10] $ ps -up 18287
USER    PID %CPU  %MEM  VSZ  RSS TT STAT STARTED   TIME COMMAND
cyrus 18287  0.0 -305.9 3688 9112 ?? I    19Sep05 0:03.66 pop3d: pop3d: xtreme-12-108.dyn.aci.on.ca [69.17.171.108]   
14:04 [11] $ ps -lp 18287 
UID   PID PPID CPU PRI NI  VSZ  RSS WCHAN STAT TT   TIME COMMAND
120 18287  235   0   2  0 3688 9112 netio I    ?? 0:03.66 pop3d: pop3d: xtreme-12-108.dyn.aci.on.ca [69.17.171.108]  
14:04 [12] $ fstat -p 18287
USER     CMD          PID   FD MOUNT       INUM MODE         SZ|DV R/W
cyrus    pop3d      18287   wd /var/spool/imap 30402863 drwx------     512 r 
cyrus    pop3d      18287    0* internet stream tcp fffffc0118f36498 205.207.148.251:995 <-> 69.17.171.108:2068
cyrus    pop3d      18287    1* internet stream tcp fffffc0118f36498 205.207.148.251:995 <-> 69.17.171.108:2068
cyrus    pop3d      18287    2* internet stream tcp fffffc0118f36498 205.207.148.251:995 <-> 69.17.171.108:2068
cyrus    pop3d      18287    3* pipe 0xfffffc0086f7cc08 -> 0xfffffc0086f7caf0 w
cyrus    pop3d      18287    4* internet stream tcp fffffc00a8b94018 *:995
cyrus    pop3d      18287    5* unix dgram fffffe00007efa80 <-> fffffe0000a0ae80
cyrus    pop3d      18287    6 /var     1200842 -rw-------  1018832 rw
cyrus    pop3d      18287    7 /var     1200789 -rw-------  1711532 rw
cyrus    pop3d      18287    8* unix dgram fffffe00006fd500
cyrus    pop3d      18287    9 /var     1200802 -rw-------       0 rw
cyrus    pop3d      18287   10 /var     1200834 -rw-------      44 rw
cyrus    pop3d      18287   11 /var     1200778 -rw-------  7946240 rw
cyrus    pop3d      18287   12 /var     1200803 -rw-------  25296896 rw
14:04 [13] $ netstat -nA |fgrep fffffc0118f36498
fffffc0118f36498 tcp        0      0 205.207.148.251.995   69.17.171.108.2068    ESTABL
14:04 [13] $ /sbin/ping 69.17.171.108
PING xtreme-12-108.dyn.aci.on.ca (69.17.171.108): 48 data bytes
92 bytes from 3500-1.aci.on.ca (205.207.148.6): Destination Host Unreachable for icmp_seq=0
92 bytes from 3500-1.aci.on.ca (205.207.148.6): Destination Host Unreachable for icmp_seq=1
92 bytes from 3500-1.aci.on.ca (205.207.148.6): Destination Host Unreachable for icmp_seq=2
^?
----xtreme-12-108.dyn.aci.on.ca PING Statistics----
5 packets transmitted, 0 packets received, 100.0% packet loss

-- 
						Greg A. Woods

H:+1 416 218-0098  W:+1 416 489-5852 x122  VE3TCP  RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>