Subject: Re: perhaps time to check our TCP against spec?
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-net
Date: 04/06/1998 17:42:34
>Tell me how I can reproduce this on my machine.  The exact command,
>please.

Using the enclosed, trivial, `do-ttcp' script, do:

	tcpdump -w <outputfile> -i lo0 &
	do-ttcp  -r &
	do-ttcp  -t 127.0.0.1

Then look at the MSS advertised.  I'm seeing this on a machine with
-current as of about a week ago.  

The problems here are long-standing, it was discussed elsewhere just
days after the syn-cache code and changes intended to support PMTU
went into -current.

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	/usr/local/bin//do-ttcp
#
echo x - /usr/local/bin//do-ttcp
sed 's/^X//' >/usr/local/bin//do-ttcp << 'END-of-/usr/local/bin//do-ttcp'
X#!/bin/sh
Xexec ttcp -s -v -f m -b 65535 -l 16384 -n 20480 $*
END-of-/usr/local/bin//do-ttcp
exit

>FWIW, I have seen NO problems with "poorly timed ACKs", when using
>the loopback interface or otherwise.

But Jason, I have *already* exerpts from the tcpdump traces which show
there is a problem here.  You asked for evidence. You got it.  What
does it take to convince you you're wrong and that there *is* a
problem?

If you want more, you can come up to the mid-peninsula to my
grad-student apartment and watch it happening. Or arrange to visit my
lab at Stanford.  And see the traces going back to, if I still have
them, summer 1996.  As you choose.


>If there's really a problem, I want to fix it.  

But Jason, i already showed traces which show there *is* a problem of
some kind.  what is this ``if there's really a problem'' nonsense?
Are you saying I'm faking the data?

But what I see is that this problem has been brought to you rattention
before, and it still hasn't been fixed.  What does it take to get you



>Actually, I happen to disagree with your analysis of the "problem".

Jason, there are two separate issues here: convoying of data packets
and convoying of acks, and MTU issues.

On the MTU/MSS comptuation: one of the places where the MTU is wrong
is loopback interfaces.  For traffic which *is* over the loopback
interface, ignoring the loop-back MTU is just silly.  If the RFCs say
you should, they're broken and should be fixed.  I dont see any room
for arugment there.


>When a host advertises an MSS to the peer, the recommended value is:
>
>        Largest MTU of any physical interface with an IP address
>        assigned to it minus the size of the TCP + IP headers.
>
>This recommendation is specifically designed to leave the loopback
>interface out of the computation!  

Some comments:

   A) you give me a hard time about not showing evidence, and
      then you quote this without attributing it?  Hypocrite.

   b) If it's from the Path MTU RFCs (rFC1191), do you think it's
      really a good engineering  decision to apply those to hosts
      which are *not* doing PTMTU?   

   c) Is PMTU acutally shipping in NetBSD, either  -current or 1.3.x?

   d) Is it fair to say that applying PMTU-specific computations,
      on machines which aren't capable of doing PMTU, is a bug?

   e) Here's a topology and a scenario where I think NetBSD's
      existing implementation  computes bad MSS values
     (pologies to those who've seen it before):

.... HIPPI subnet                           FDDI ring .....
      |                                      |
      A                                      B
      |                                      |
      --------------- Ethernet ---------------

A and B have a connection to the same common Ethernet segment.  Each
machine has a connection to a higher-MTU subnet as well, and the
higher-MTU subnets are not connected.

Now, suppose that A or B (or both) are machines running NetBSD, which
are not doing PMTU, but do have the in_maxmtu computation
enabled. Exactly what does the existing behaviour do in released
versions of NetBSD?

To steal part of RFC 1191 (Path MTU Discovery) which someone kindly
forwarded  to me:

}   [...]  The MSS option should be 40 octets less than the
}   size of the largest datagram the host is able to reassemble (MMS_R,
}   as defined in [1]); in many cases, this will be the architectural
}   limit of 65495 (65535 - 40) octets.  A host MAY send an MSS value
}   derived from the MTU of its connected network (the maximum MTU over
}                                                  ^^^^^^^^^^^^^^^^^^^^
}   its connected networks, for a multi-homed host); this should not
}   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
}   cause problems for PMTU Discovery, and may dissuade a broken peer
}   from sending enormous datagrams.
}
}          Note: At the moment, we see no reason to send an MSS greater
}          than the maximum MTU of the connected networks, and we
}          recommend that hosts do not use 65495.  It is quite possible
}          that some IP implementations have sign-bit bugs that would be
}          tickled by unnecessary use of such a large MSS.


OK. So stipulated. Now, what does this imply for hosts which aren't
doing PMTU or don't implement all of PMTU properly?  What's
the right thing to do there and in scenarios like the above?

Here's another case:
                                                  
   Metricom Radio                     Metricom Radio        			
   < 1200 byte MTU                    < 1200 byte MTU       
   |                                   |                     
   |                                   |                     
  machine A                           machine B
   |                                   |                     
   Ethernet NIC,                       Ethernet NIC,         
  currently no                        currently no           
    external                            external             
  connectivity                        connectivity


(The Ethernet NICs could be connected to an isolated home LAN, or to
terminators.)  Now, suppose machine A wants to establish a TCP
conection to machien B over the wireless radio.  Again, suppose that
both machiesn have the extant NetbSD code and aren' doing PMTU.

What MSS will these machines compute and advertise?  Will it result in
fragmentation?  that's a really horrid thing ot do to the Metricom
radio, where there's a limit of about 10 packets/sec irrespective of
size.

And one more time: I am surprised to see there are still any bugs in
this area, because when you said they were fixed, I believed you.
Because you (and others) are doing a great job. OK?  that should be
a compliment.

Instead, it turns into a flamewar, just because you cannot accept that
NetBSD has bugs even after seeing tcpdump traces _which_ _prove_
_those_ _bugs_ _do_ _exist_.