Subject: Re: fragmentation by NetBSD routers vs. reassembly on other systems....
To: NetBSD Networking Technical Discussion List <tech-net@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: tech-net
Date: 09/02/2000 12:50:54
[ On Saturday, September 2, 2000 at 18:07:08 (+1100), Robert Elz wrote: ]
> Subject: Re: fragmentation by NetBSD routers vs. reassembly on other systems.... 
>
>     Date:        Sat,  2 Sep 2000 02:46:30 -0400 (EDT)
>     From:        woods@weird.com (Greg A. Woods)
>     Message-ID:  <20000902064630.A5A8F5@proven.weird.com>
> 
>   | The proof was clearly visible in the
>   | evidence collected by actual experimentation.
> 
> I have no idea what you have been experimenting with, but...
> 
> lavender$ sysctl net.inet.tcp.mssdflt
> net.inet.tcp.mssdflt = 512
> lavender$ telnet cvs.netbsd.org
> Trying 130.233.224.75...
> telnet: Unable to connect to remote host: Connection refused
> 
> 17:58:50.397583 128.250.1.202.65489 > 130.233.224.75.23: S 845201410:845201410(0) win 16384 <mss 1460,[|tcp]>
> 17:58:50.917854 130.233.224.75.23 > 128.250.1.202.65489: R 0:0(0) ack 845201411 win 0 (DF)
> 
> Note the advertised MSS, and its relationship with the net.inet.tcp.mssdflt
> (128.250.1.202 is lavender).   Clearly it is not on the same net as 
> cvs.netbsd.org.   (That the telnet attempt was refused is no surprise).

Note also that mssdflt isn't going to do anything useful unless you also
turn off mtudisc (which is on by default in at least 1.4V and newer).
The only explanation for your observation is that you have mtudisc enabled.

My experiments were with NetBSD-1.3.3 release (which does not have
mtudisc turned on by default) and various incarnations of MS-Windows
clients (95, 98, and 2000) that were connected via high-bandwith,
low-latency pipes and which were at least one routing hop away from the
server.  (And indeed must be at least one routing hop away, and be on
a different *network* if `net.inet.ip.subnetsarelocal' is also turned
on!)

An HTTP transfer by such a client from a NetBSD-1.3.3 server in its
default configuration causes the server to send 512-byte data segments.
By increasing mssdflt to 1460 the maximum sized packets were
transmitted, allowing the TCP connection to make the best possible use
of the available bandwidth and to achieve a significant percentage of
the theoretical maximum transmission speed.

Even changing mssdlft after a connection was already open caused the
packets to immediately get larger (and of course for throughput
performance to increase proportionally).  This latter observation has
(pleasantly) surprised me because I didn't expect it to happen.

> I know - I also acted without thinking, and tried fiddling with that when
> I was experiencing a problem with a host that had broken PMTU discovery
> (from home, I use a very low MTU) - I thought if I set the MSS real low,
> I could cause the remote host to send small packets, and its broken PMTU
> discovery would stop bothering me.   And I expect it would have, except
> that net.inet.tcp.mssdflt doesn't do that.   It does just what jhawk said
> it does.   I read the code...

There may be various versions of the code in place.  I've been reading
primarily in Stevens Vol. 2 and comparing with NetBSD when I observe
behaviour different from what I've expected from the code.

It is fairly obvious from Stevens, both in his commentary and examples
in 18.4, as well as from his description of why 4.4BSD doesn't conform
to the corresponding "SHOULD" in RFC-1122, and as verified by my direct
observations, that when Path-MTU-discovery is turned off the default
mssdflt setting of 512 is designed only to generate packets that will
fit through the average SLIP (or PPP) pipe, and though the proper
protocol default should be 536, the *BSD default will probably allow for
more effective memory allocation.  I am also reasonably certain the
rationale for this "SHOULD" in RFC-1122 (there isn't one printed right
there, but it may be in an older RFC that I've not recently read) is
exactly for this reason too.

However since there are no longer many such pipes that are not at the
end points (i.e. where the end systems will obviously be able to choose
the correct MSS based on their local interface MTU), it is clearly safe
(and also as per my observations also critical to meeting performance
expectations) to increase the mssdflt on any server to be at least 1400
(but no more than 1460).  This is true, as verified by experimentation,
not just for 100-mbps connected servers, but also for 10-mbps connected
servers.  I don't know yet if there's a relationship between the
interface speed and the available bandwidth, nor do I have any
measurements yet to show how this affects the network, especially in
conjested conditions, though I would expect it only to be positive since
it causes fewer total bytes to be transmitted.  Note that there is some
hint in Stevens (ch.24) that larger packets are not always better on
multi-hop connections, though when the difference is between 512 and
1400 bytes my observations don't exactly match Stevens (and Bellovin).

Anyway this is really only a side issue to the central problem I've
observed.....

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>