Subject: kern/3754: TCP: "PSH" bit causes immediate ACK
To: None <gnats-bugs@gnats.netbsd.org>
From: None <davide+@cs.cmu.edu>
List: netbsd-bugs
Date: 06/16/1997 23:02:16
>Number:         3754
>Category:       kern
>Synopsis:       TCP "push" triggers wrong acknowledgement behavior
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 16 20:05:00 1997
>Last-Modified:
>Originator:     David Eckhardt
>Organization:
Carnegie Mellon University Computer Science Department
>Release:        1.2
>Environment:
	System: NetBSD piper.nectar.cs.cmu.edu 1.1 NetBSD 1.1 (PIPER) #5: Tue Mar 18 17:50:12 EST 1997 davide@piper.nectar.cs.cmu.edu:/usr/davide/cmucs-netbsd/src/sys/arch/i386/compile/PIPER i386

>Description:
	In NetBSD TCP, in tcp_input.c:tcp_input(), if the incoming TCP packet
	has the PSH/TH_PUSH/"push" flag on, TF_ACKNOW (generate an ack
	*immediately*) is set.  While that is not prohibited by RFC 793
	or 1122, I can't see where that is suggested, either ("push"
	means "awaken the receiving process when you deliver this packet"),
	and it results in the following bad behavior:

	Since essentially all TELNET typing packets have the "push" bit on,
	TCP will immediately generate an acknowledgement.  Meanwhle, the
	character moves through telnetd and the user's application, is
	echoed, and the echo moves back through telnetd into tcp_output().
	At this point, one of two things can happen:  either tcp_output()
	will send the echo out in a packet, or it will hold onto it and
	piggyback it on the user's next character.  Either one is bad:
	in the first case, two packets are generated where one will do;
	in the second case, the user sees each character echoed only after
	typing the next one (that is, after a delay of several times
	the human-visible 50-millisecond threshold).  Since the echo of
	the last character of a burst has nothing to piggyback on,
	it sits until a slow timeout takes place.

	I believe that this behavior has been overlooked because,
	on an Ethernet, the round trip time is so low that tcp_output
	always sends two packets (though I have not verified this).
	On the other hand, it is not only perceptible but annoying
	over a 14.4kb dialup (or, of course, the busy-network equivalent).

	Meanwhile, Solaris and OSF/1 (at least) do not exhibit this
	behavior.  This change seems to have taken place between
	4.4-beta @(#)tcp_input.c 8.2 (Berkeley) 8/10/93 and NetBSD 1.0
	@(#)tcp_input.c 8.5 (Berkeley) 4/10/94 (I don't have any
	intermediate source trees).  The 4.3 code had a patchable
	global policy variable (always ack immediately vs. always
	ack later), and 4.3-reno, 4.3-tahoe, 4.4-lite, and 4.4-beta
	all use delayed acks except under certain circumstances
	(such as a reassembly hole or connection open/close).

>How-To-Repeat:
	I enclose below tcpdump output.  REBOOT.NECTAR.CS.CMU.EDU is
	running Windows 95; PIPER.NECTAR.CS.CMU.EDU is running NetBSD 1.2,
	and GEAR.COMPOSE.CS.CMU.EDU is running some version of Solaris.
	The first packet from REBOOT is an ack of the echo of the
	carriage return that started tcpdump.

	Note that, for the first two characters, NetBSD sends back
	an ack and then later an echo; for the third packet, it sends
	back only an ack, and thereafter it does the 1-delayed echoing
	until the last character, which is echoed back much later.
	The tcpdump was running on PIPER, the NetBSD machine, so the
	delays appear better than they actually are on the far end of
	a dialup link.  Solaris piggybacks the ack and the echo in all
	cases.

	tcpdump: listening on ep0
	17:57:42.795855 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: . ack 1527774498 win 8407 (DF) (ttl 31, id 57613)
	17:57:44.514481 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: P 0:1(1) ack 1 win 8407 (DF) (ttl 31, id 57869)
	17:57:44.514699 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: . ack 1 win 17520 [tos 0x10] (ttl 64, id 13515)
	17:57:44.515572 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: P 1:2(1) ack 1 win 17520 [tos 0x10] (ttl 64, id 13516)
	17:57:44.656560 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: P 1:2(1) ack 2 win 8406 (DF) (ttl 31, id 58125)
	17:57:44.656708 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: . ack 2 win 17519 [tos 0x10] (ttl 64, id 13517)
	17:57:44.657572 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: P 2:3(1) ack 2 win 17520 [tos 0x10] (ttl 64, id 13518)
	17:57:44.744486 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: P 2:3(1) ack 2 win 8406 (DF) (ttl 31, id 58381)
	17:57:44.744684 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: . ack 3 win 17520 [tos 0x10] (ttl 64, id 13519)
	17:57:44.837873 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: P 3:4(1) ack 3 win 8405 (DF) (ttl 31, id 58637)
	17:57:44.838007 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: P 3:4(1) ack 4 win 17519 [tos 0x10] (ttl 64, id 13520)
	17:57:44.934361 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: P 4:5(1) ack 4 win 8404 (DF) (ttl 31, id 58893)
	17:57:44.934494 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: P 4:5(1) ack 5 win 17519 [tos 0x10] (ttl 64, id 13521)
	17:57:45.025176 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: P 5:6(1) ack 5 win 8403 (DF) (ttl 31, id 59149)
	17:57:45.025313 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: P 5:6(1) ack 6 win 17519 [tos 0x10] (ttl 64, id 13522)
	17:57:45.348030 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: . ack 6 win 8402 (DF) (ttl 31, id 59405)
	17:57:45.348150 PIPER.NECTAR.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.nterm: P 6:7(1) ack 6 win 17520 [tos 0x10] (ttl 64, id 13523)
	17:57:45.600165 REBOOT.PC.CS.CMU.EDU.nterm > PIPER.NECTAR.CS.CMU.EDU.telnet: . ack 7 win 8401 (DF) (ttl 31, id 59661)
	17:57:49.453893 REBOOT.PC.CS.CMU.EDU.1099 > GEAR.COMPOSE.CS.CMU.EDU.telnet: P 34745685:34745686(1) ack 1190016186 win 8575 (DF) (ttl 31, id 59917)
	17:57:49.455203 GEAR.COMPOSE.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.1099: P 1:2(1) ack 1 win 4096 (ttl 60, id 22653)
	17:57:49.563163 REBOOT.PC.CS.CMU.EDU.1099 > GEAR.COMPOSE.CS.CMU.EDU.telnet: P 1:3(2) ack 2 win 8574 (DF) (ttl 31, id 60173)
	17:57:49.564442 GEAR.COMPOSE.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.1099: P 2:4(2) ack 3 win 4096 (ttl 60, id 22654)
	17:57:49.657574 REBOOT.PC.CS.CMU.EDU.1099 > GEAR.COMPOSE.CS.CMU.EDU.telnet: P 3:4(1) ack 4 win 8572 (DF) (ttl 31, id 60429)
	17:57:49.658851 GEAR.COMPOSE.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.1099: P 4:5(1) ack 4 win 4096 (ttl 60, id 22655)
	17:57:49.746409 REBOOT.PC.CS.CMU.EDU.1099 > GEAR.COMPOSE.CS.CMU.EDU.telnet: P 4:5(1) ack 5 win 8571 (DF) (ttl 31, id 60685)
	17:57:49.747657 GEAR.COMPOSE.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.1099: P 5:6(1) ack 5 win 4096 (ttl 60, id 22656)
	17:57:49.836512 REBOOT.PC.CS.CMU.EDU.1099 > GEAR.COMPOSE.CS.CMU.EDU.telnet: P 5:6(1) ack 6 win 8570 (DF) (ttl 31, id 60941)
	17:57:49.837797 GEAR.COMPOSE.CS.CMU.EDU.telnet > REBOOT.PC.CS.CMU.EDU.1099: P 6:7(1) ack 6 win 4096 (ttl 60, id 22657)
	17:57:50.046013 REBOOT.PC.CS.CMU.EDU.1099 > GEAR.COMPOSE.CS.CMU.EDU.telnet: . ack 7 win 8569 (DF) (ttl 31, id 61197)

	630 packets received by filter
	0 packets dropped by kernel

>Fix:
	At the moment I have hacked my telnetd to set the TCP_NODELAY
	socket option, which forces my TCP into the 2-packet
	ack-then-echo regime.  The difference is very perceptible,
	even though this is far from the optimal behavior.

	But this is only a hack.  I don't understand the motivation
	behind reading PSH as "ack immediately", unless it was an
	attempt to obtain more accurate round trip time estimates,
	which is more properly done with the timestamp option.
	Interestingly enough, tcp_output() does not seem to set
	PSH except for exactly the case that RFC 793 suggests,
	namely transmitting the last data byte produced by the
	user.

	I would suggest rolling back this change to the reno/tahoe/4.4
	behavior, possibly modified by turning on ACKNOW if the
	incoming packet contains a timestamp request (look around
	the test for "optlen == TCPOLEN_TSTAMP_APPA").

	In the meantime, you might want to hack telnetd the way
	I did, conditional on the appropriate kernel version.
>Audit-Trail:
>Unformatted: