Subject: Re: perhaps time to check our TCP against spec?
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-net
Date: 04/06/1998 18:26:38
On Mon, 6 Apr 1998 17:42:34 -0700 (PDT) 
 Jonathan Stone <jonathan@DSG.Stanford.EDU> wrote:

 > Using the enclosed, trivial, `do-ttcp' script, do:
 > 
 > 	tcpdump -w <outputfile> -i lo0 &
 > 	do-ttcp  -r &
 > 	do-ttcp  -t 127.0.0.1
 > 
 > Then look at the MSS advertised.  I'm seeing this on a machine with
 > -current as of about a week ago.  

Um, I'm not talking about the advertised MSS "problem".  I'm talking
about the "badly timed ACKs" problem you are claiming exists in our
TCP.

 > But Jason, I have *already* exerpts from the tcpdump traces which show
 > there is a problem here.  You asked for evidence. You got it.  What
 > does it take to convince you you're wrong and that there *is* a
 > problem?

Yes, I saw you're tcpdump output, and I want to know how to reproduce
this problem MYSELF.  Above you said "exerpts".  I don't want exerpts.
I want a way to make the problem occur on my own lab systems (so that I
can eliminate the variables and find the real problem, assuming one exists),
so I can fully understand it, and then fix it.

 > If you want more, you can come up to the mid-peninsula to my
 > grad-student apartment and watch it happening. Or arrange to visit my
 > lab at Stanford.  And see the traces going back to, if I still have
 > them, summer 1996.  As you choose.

Perhaps if you could describe the exact conditions under which this
occurs, I could help you better.  Also, why would I be interested in
traces taken in mid-1996?  Our TCP has undergone serious work since
then.

 > But Jason, i already showed traces which show there *is* a problem of
 > some kind.  what is this ``if there's really a problem'' nonsense?
 > Are you saying I'm faking the data?

No, I'm not.  I AM saying that I want to know how you reproduce the problem,
so that I can narrow down where I start looking, and perhaps make it occur
on my own test and development systems.

Let's get real, here... mailing me a snippet of a tcpdump doesn't tell
me much, without some context.  If you sent something like that to any
commercial UNIX vendor, they'd probably want some more information.  Why
am I any different?

 > >Actually, I happen to disagree with your analysis of the "problem".
 > 
 > Jason, there are two separate issues here: convoying of data packets
 > and convoying of acks, and MTU issues.

Jonathan, I KNOW THERE ARE TWO SEPARATE ISSUES HERE!  And, for the record,
I am NOT confusing the two.

 > On the MTU/MSS comptuation: one of the places where the MTU is wrong
 > is loopback interfaces.  For traffic which *is* over the loopback
 > interface, ignoring the loop-back MTU is just silly.  If the RFCs say
 > you should, they're broken and should be fixed.  I dont see any room
 > for arugment there.
 > 
 > 
 > >When a host advertises an MSS to the peer, the recommended value is:
 > >
 > >        Largest MTU of any physical interface with an IP address
 > >        assigned to it minus the size of the TCP + IP headers.
 > >
 > >This recommendation is specifically designed to leave the loopback
 > >interface out of the computation!  
 > 
 > Some comments:
 > 
 >    A) you give me a hard time about not showing evidence, and
 >       then you quote this without attributing it?  Hypocrite.

Go ahead and call names if you like.  The fact of the matter is that
when I wrote that mail, I was in the middle of handling a fairly nasty
fire, trying to fix a resource which you yourself use.  If you like,
I can let that resource sit and go quote the line number.  Quite honestly,
I think I'm doing better by saying - "It's in an RFC, but I don't have time
to look up the exact section right at this very moment" than you are by not
telling me how to reproduce a bug you are claiming exists on our TCP.

 >    b) If it's from the Path MTU RFCs (rFC1191), do you think it's
 >       really a good engineering  decision to apply those to hosts
 >       which are *not* doing PTMTU?   

Yes.  And the reason for this is:

	The MSS we advertise is what we are willing to receive.  We
	should always advertise the recommended value regardless if
	we are doing Path MTU Discovery because the peer _may_ be doing
	Path MTU Discovery, and we don't want to defeat whatever the
	peer may be doing in this regard.

 >    c) Is PMTU acutally shipping in NetBSD, either  -current or 1.3.x?

"Yes."  It shipped in 1.3!  It was not, however, enabled by default.
Again, read source-changes once in a while.  You might learn something.

 >    d) Is it fair to say that applying PMTU-specific computations,
 >       on machines which aren't capable of doing PMTU, is a bug?

See my response to (a) above.

 >    e) Here's a topology and a scenario where I think NetBSD's
 >       existing implementation  computes bad MSS values
 >      (pologies to those who've seen it before):
 > 
 > .... HIPPI subnet                           FDDI ring .....
 >       |                                      |
 >       A                                      B
 >       |                                      |
 >       --------------- Ethernet ---------------
 > 
 > A and B have a connection to the same common Ethernet segment.  Each
 > machine has a connection to a higher-MTU subnet as well, and the
 > higher-MTU subnets are not connected.
 > 
 > Now, suppose that A or B (or both) are machines running NetBSD, which
 > are not doing PMTU, but do have the in_maxmtu computation
 > enabled. Exactly what does the existing behaviour do in released
 > versions of NetBSD?

in_maxmtu is always computed, for the record.

For your example, let's assume the following (which happens to be true):

	ETHERMTU < FDDIMTU < HIPPIMTU

Now, let's run a few permuations:

	Case 1: Host A and Host B both have IP addresses assigned
	ONLY to the Ethernet interfaces.

		A MSS to advertise = ETHERMTU - sizeof(struct tcpiphdr)
		B MSS to advertise = ETHERMTU - sizeof(struct tcpiphdr)

	Case 2: Host A has an IP address assigned to its Ethernet
	and Hippi interfaces.  Host B has an IP address assigned only
	to its Ethernet interface.

		A MSS to advertise = HIPPIMTU - sizeof(struct tcpiphdr)
		B MSS to advertise = ETHERMTU - sizeof(struct tcpiphdr)

	Case 3: Host A has an IP address assigned to its Ethernet
	and Hippi interfaces.  Host B has an IP address assigned to
	its Ethernet and FDDI interfaces.

		A MSS to advertise = HIPPIMTU - sizeof(struct tcpiphdr)
		B MSS to advertise = FDDIMTU - sizeof(struct tcpiphdr)

"Where's the problem?"

 > OK. So stipulated. Now, what does this imply for hosts which aren't
 > doing PMTU or don't implement all of PMTU properly?  What's
 > the right thing to do there and in scenarios like the above?

Jonathan, get a clue.  PATH MTU DISCOVERY HAS NOTHING TO DO WITH THE
MSS WE ADVERTISE TO THE PEER!  Well, it does, but only so much as it
enables the PEER to use the larger segment size if the PEER is doing
Path MTU Discovery.  Whether or not our side of the connection is doing
Path MTU Discovery should have absolutely zero bearing on this issue.

 > Here's another case:
 >                                                   
 >    Metricom Radio                     Metricom Radio        			
 >    < 1200 byte MTU                    < 1200 byte MTU       
 >    |                                   |                     
 >    |                                   |                     
 >   machine A                           machine B
 >    |                                   |                     
 >    Ethernet NIC,                       Ethernet NIC,         
 >   currently no                        currently no           
 >     external                            external             
 >   connectivity                        connectivity
 > 
 > 
 > (The Ethernet NICs could be connected to an isolated home LAN, or to
 > terminators.)  Now, suppose machine A wants to establish a TCP
 > conection to machien B over the wireless radio.  Again, suppose that
 > both machiesn have the extant NetbSD code and aren' doing PMTU.
 >
 > What MSS will these machines compute and advertise?  Will it result in
 > fragmentation?  that's a really horrid thing ot do to the Metricom
 > radio, where there's a limit of about 10 packets/sec irrespective of
 > size.

Both will advertise an MSS of ETHERMTU - sizeof(struct tcpiphdr), because
ETHERMTU > METRICOMMTU, that is, assuming the Ethernet has an IP address
assigned to it.  If not, they will advertise METRICOMMTU -
sizeof(struct tcpiphdr).  "Where's the problem?"

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-5                                       Work: +1 650 604 0935
Moffett Field, CA 94035                             Pager: +1 415 428 6939