Subject: poll / tcp weirdness
To: None <tech-net@netbsd.org>
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
List: tech-net
Date: 08/25/1999 08:49:56
I am seeing a weirdness in poll() in a small number of tcp
connections.  Basically the code does the following:

	pollfd[i].events = POLLOUT;                 

	poll(pollfd, numpollfds, 0);             

	if (pollfd[i].revents & POLLOUT)
		write(pollfd[i].fd,...);

I just check each fd to double check that it won't block on me and
then feed it.  The problem is that on some connections I see poll()
never returning the happy feed-me status.  This is occurs even when
the remote system is offering an almost full tcp-window (eg 33 out of
17k used).

Is poll an overly conservative routine that sometimes errs on the side
of not indicating something is writable?  What would cause it to think
that a write to a tcp/ipv4 socket would block?

    tcpdump:

    20:17:24.972881 209-246-58-153.chi1.foocom.net.36892 > c460058-a.frmt1.sfba.home.com.rtcm-sc104: S 48112894:48112894(0) win 8192 <mss 1460,nop,nop,sackOK> (DF)
    20:17:24.973081 c460058-a.frmt1.sfba.home.com.rtcm-sc104 > 209-246-58-153.chi1.foocom.net.36892: S 2174248251:2174248251(0) ack 48112895 win 16384 <mss 1460> (DF)
    20:17:25.073345 209-246-58-153.chi1.foocom.net.36892 > c460058-a.frmt1.sfba.home.com.rtcm-sc104: . ack 1 win 8760 (DF)
    20:17:25.082637 209-246-58-153.chi1.foocom.net.36892 > c460058-a.frmt1.sfba.home.com.rtcm-sc104: P 1:49(48) ack 1 win 8760 (DF)
    20:17:25.190590 c460058-a.frmt1.sfba.home.com.rtcm-sc104 > 209-246-58-153.chi1.foocom.net.36892: . ack 49 win 17472 (DF)
    20:17:26.590715 c460058-a.frmt1.sfba.home.com.rtcm-sc104 > 209-246-58-153.chi1.foocom.net.36892: P 1:34(33) ack 49 win 17472 (DF) [tos 0x10]
    20:17:30.886502 209-246-58-153.chi1.foocom.net.36892 > c460058-a.frmt1.sfba.home.com.rtcm-sc104: F 49:49(0) ack 34 win 8727 (DF)
    20:17:30.886717 c460058-a.frmt1.sfba.home.com.rtcm-sc104 > 209-246-58-153.chi1.foocom.net.36892: . ack 50 win 17520 (DF) [tos 0x10]
    20:17:31.590816 c460058-a.frmt1.sfba.home.com.rtcm-sc104 > 209-246-58-153.chi1.foocom.net.36892: P 34:67(33) ack 50 win 17520 (DF) [tos 0x10]
    20:17:31.694884 209-246-58-153.chi1.foocom.net.36892 > c460058-a.frmt1.sfba.home.com.rtcm-sc104: R 48112944:48112944(0) win 0 (DF)

    syslog:

    Aug 24 20:17:25 capsicum rtcmd[24481]: Accepting connection 3 port rtcm-sc104 209-246-58-153.chi1.foocom.net
    Aug 24 20:17:26 capsicum rtcmd[24481]: Log 3 209-246-58-153.chi1.foocom.net: HELO 192.168.0.4 Jeff's WinDGPS+IP Client 1.01
    Aug 24 20:17:32 capsicum rtcmd[24481]: Closing connection 3 port rtcm-sc104 209-246-58-153.chi1.foocom.net (c2) 7.0 sec 4 drops

Notice there are "4 dropped writes".  Basically a poll() that didn't
find the fd writable 4x.  The write frequency should be once per
second.  The writes at 27, 28, 29, 30 seconds never happened.  The
remote site did use "sackOK", so I'll fight the temptation to make
some disparaging remarks about win98...

Anyone have any clues why poll would indicate a not-ready-for-write
status?  This is driving me bonkers.

-wolfgang
-- 
       Wolfgang Rupprecht <wolfgang+gnus@dailyplanet.wsrcc.com>
		    http://www.wsrcc.com/wolfgang/
DGPS signals via the Internet  http://www.wsrcc.com/wolfgang/gps/dgps-ip.html