Subject: Re: connection bonding?
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Steven M. Bellovin <smb@cs.columbia.edu>
List: tech-net
Date: 12/07/2005 22:42:43
In message <200512080313.WAA21954@Sparkle.Rodents.Montreal.QC.CA>, der Mouse wr
ites:
>> The agr(4) man page hints at the issue.  The problem is that you
>> *really* don't want TCP segments from a single connection arriving
>> out of order.  While TCP semantics guarantee that things will work,
>> it will cause a tremendous performance hit.
>
>Surely that's a (performance) bug in the TCP stacks?  The network has
>never promised it won't reorder packets, and indeed when I traceroute I
>see packets only a second apart taking different paths in terms of
>router addresses (surely going down different wires to different
>next-hops is even more reordering-prone than going down different wires
>to the same next-hop?).

Right, which is why that isn't done.  A useful heuristic is that all 
packets for the same TCP connection should go down the same output 
interface at each hop.  Using a hash function is a stateless 
approximation what you want.
>
>> In particular, if a sender receives 3 duplicate ACKs in a row, it
>> slams its congestion window shut and restarts the whole slow start
>> business.
>
>Wouldn't it make more sense to either be less hair-trigger in the
>congestion detection algorithms or also implement some kind of
>reordering detection to avoid this?  Assuming the network won't reorder
>packets really strikes me as a modern version of the "everything is on
>a fast LAN" problem that many Berkeley network programs suffered from,
>and which got smoked out once IP started getting used over longer
>paths.

It's a research problem, i.e., the TCP performance mafia doesn't know 
the answer yet, as best I can tell.  See, for example, Ethan Blanton,
Mark Allman. On Making TCP More Robust to Packet Reordering. ACM
Computer Communication Review, 32(1), January 2002,
http://www.icir.org/mallman/papers/tcp-reorder-ccr.ps ; also see
http://www.icir.org/mallman/papers/draft-ietf-tcpm-tcp-dcr-06.txt
for *proposed* TCP behavior changes to make it better.
>
>If none of those are considered usable for some reason, then I guess
>the world needs a link aggregation design that preserves packet order
>but still load-balances sanely.  In NetBSD kernel terms, I wonder if
>all the member interfaces could be made to share the same output queue?
>You'd mostly get lack of reordering *and* still have real
>load-balancing, it seems to me.
>
That's reall hard, because of the mix of TCP packet sizes.  Most bytes 
are carried by max-MTU packets, but 40-50% of packets are 40 or 44-byte 
ACKs.  Even with the same line speed, you have a ~10-35x size 
difference, which leaves plenty of room for reordering.

		--Steven M. Bellovin, http://www.cs.columbia.edu/~smb