Subject: Re: TCP ACK convoying....
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: tech-net
Date: 04/07/1998 15:37:42
Ok, here's my theory on the convoying behavior seen over the loopback
interface.  I believe it's an artifact of the
send-two-packets-for-each-ack behavior during slow-start combined with
FIFO buffering in the loopback interface.

The following is written assuming ack-every-segment (no delayed ack);
generalizing it to ack-every-other-segment is left as an exercise for
the reader.

`A' is the first data packet
`a' is the ack packet for `a'..

We process the head of the queue in turn, and put newly-generated
packets on the tail in response. (I'm assuming that there's enough
buffering on the transmit and receive sockets that we can ignore any
context-switches to those processes to get more data/dispose of the
data).

The queue thus looks something like this over time:

	A		(send first data packet)
	a		(process it, send first ack)
	BC		(send 2nd, 3rd data packets back-to-back)
	Cb		(ack data 2)
	bc		(ack data 3)
	cDE		(process ack 2, send data 4, 5 back-to-back)
	DEFG		(process ack 3, send data 6, 7 back-to-back)
	EFGd		...
	FGde
	Gdef
	defg
	efgHI
	fgHIJK
	gHIJKLM
	HIJKLMNO

So, during slow start, we alternate between generating acks and
generating data packets, because the same `process' (netisr) is
generating and processing both streams of data.  Since the loopback
interface never reorders packets, once you build these convoys, they
pretty much stay that way, assuming the traffic sources and sinks are
fast enough...

It's worth noting that the BPF tap on the loopback interface is only
on the output side.  To aid understanding here, it might be
informative to embed a second bpf_mtap() call in ipintr() just to get
timestamps on when packets were dequeued prior to processing..

I don't see how this generalizes to traffic between multiple systems,
where you have a different queue in each direction and two processes
chewing through the packets instead of one.  A test case which
demonstrates it *in the non-loopback case* would be helpful..

					- Bill