Subject: kern/2274: NetBSD's MTU/MSS handling is rather broken
To: None <gnats-bugs@NetBSD.ORG>
From: John Hawkinson <jhawk@mit.edu>
List: netbsd-bugs
Date: 03/30/1996 11:26:02
>Number: 2274
>Category: kern
>Synopsis: NetBSD's MTU/MSS handling is rather broken
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Mar 30 11:50:01 1996
>Last-Modified:
>Originator: John Hawkinson
>Organization:
MIT SIPB
>Release: 1.1
>Environment:
System: NetBSD lola-granola 1.1A NetBSD 1.1A (LOLA) #2: Sun Mar 10 08:01:40 EST 1996 mycroft@zygorthian-space-raiders:/afs/sipb.mit.edu/project/netbsd/dev/current-source/build/i386_nbsd1/sys/arch/i386/compile/LOLA i386
>Description:
NetBSD's TCP maximum-segment-size handling is broken. In my
environment, it has the effect of using an MSS of 512 to non-local
destinations, and using 1460 to local destinations. This is of
course, reasonably suboptimal since the local network is quite more
reliable and congestion-free than the Internet, and the congested
environment is where you want to minimize the number of packets sent.
>How-To-Repeat:
With tcp_mss_dflt set to 512 (the default), and my routing table
as follows:
[lola-granola!jhawk] ~> netstat -rn
Routing tables
Internet:
Destination Gateway Flags Refs Use Mtu Interface
default 18.70.0.1 UGS 21 2451568 - fe0
18.70 link#2 UC 0 0 - fe0
18.70.0.1 0:0:c:5:a2:33 UHL 1 1348 - fe0
18.70.0.6 8:0:20:74:0:98 UHL 0 10 - fe0
18.70.0.26 127.0.0.1 UGHS 1 2206 - lo0
18.70.0.36 0:40:95:4:fd:c8 UHL 0 2 - fe0
18.70.0.54 2:60:8c:a9:f7:ae UHL 2 94 - lo0 =>
18.70.0.54 link#1 UC 0 0 - ed0
18.70.0.56 8:0:20:22:22:70 UHL 0 9639 - fe0
18.70.0.61 0:0:c0:b5:a8:d UHL 2 3179 - fe0
18.70.0.158 link#1 UCS 0 0 - ed0
18.70.0.160 8:0:2b:2b:eb:3b UHL 1 17 - fe0
18.70.0.161 8:0:20:22:cf:21 UHL 0 21 - fe0
18.70.0.215 8:0:20:1f:49:df UHL 0 17484 - fe0
18.70.0.216 link#1 UCS 0 0 - ed0
18.70.0.218 8:0:20:75:3c:eb UHL 0 19408 - fe0
18.70.0.224 8:0:2b:e:f8:4 UHL 1 124 - fe0
18.70.0.252 8:0:69:8:96:6f UHL 1 2014 - fe0
18.70.2.1 0:80:d3:a0:27:5f UHL 1 4 - fe0
127.0.0.1 127.0.0.1 UH 5 298684 - lo0
(note that the machine's ip address is 18.70.0.26 and it's subnetted
to 255.255.0.0).
If I attempt to connnect to a machine on the local ethernet
(18.70.0.252), 1460 is used:
11:12:38.651510 LOLA-GRANOLA.MIT.EDU.1744 > OPUS.MIT.EDU.www: S
1714944000:1714944000(0) win 16384 <mss 1460,nop,wscale
0,nop,nop,timestamp 2630065 1985830193>
[lola-granola!jhawk] ~> route get opus
route to: OPUS.MIT.EDU
destination: OPUS.MIT.EDU
interface: fe0
flags: <UP,HOST,DONE,LLINFO>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 0 1185
If I attempt to connect to a machine on another subnet of network 18
(18.177.0.64), 1460 is used:
11:13:55.532712 LOLA-GRANOLA.MIT.EDU.1745 > PACKET-DROP.MIT.EDU.www: S 1724864000:1724864000(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 2621440 3277675825>
[lola-granola!jhawk] ~> route get packet-drop
route to: PACKET-DROP.MIT.EDU
destination: default
mask: default
gateway: NW12A-RTR-W20-ETHER.MIT.EDU
interface: fe0
flags: <UP,GATEWAY,DONE,STATIC>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 0 0
However if I attempt to connect to a machine outside of network 18
(199.94.220.184):
11:15:57.250897 LOLA-GRANOLA.MIT.EDU.1746 > all-purpose-gunk.near.net.www: S 1740480000:1740480000(0) win 16384 <mss 512,nop,wscale 0,nop,nop,timestamp 2651311 1029594417>
[lola-granola!jhawk] ~> route get ap-gunk.near.net
route to: all-purpose-gunk.near.net
destination: default
mask: default
gateway: NW12A-RTR-W20-ETHER.MIT.EDU
interface: fe0
flags: <UP,GATEWAY,DONE,STATIC>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 0 0
This seems awfully inconsistent. There does not seem to be any good
reason why connections to PACKET-DROP.MIT.EDU and
ALL-PURPOSE-GUNK.NEAR.NET do not use the same MSS. I'm not very
familiar with the internals of BSD TCP code, so looking there hasn't
been helpful, but I theorize that something is considering the network
route to 18.70.0.0 255.255.0.0 as a route to 18.0.0.0 255.0.0.0, at
least for purposes of MSS computation (i.e. the stated mask of the
route is being ignored and the classful mask is being assumed).
This seems horribly wrong and broken, but is operationally slightly
better than allowing _all_ connections off the local network to use
an MSS of 512.
As an aside, connections to localhost use an MSS of 30720. One would
think this could be improved substantially (but perhaps not?).
>Fix:
1. Implement path MTU discovery. FreeBSD has it, so we really
should get it at some point. I suppose this is unlikely to
happen soon.
2. Fix the aforementioned masking problem. Unfortunately this
seems somewhat counterproductive if nothing else is done.
3. Change tcp_dflt_mss from 512 to 1460. This is the easy way
out. Unfortunately, I'm not quite sure what effect it will
have when there are <1460 mss links in the middle. I suppose
it is likely to cause fragmentation on those links, but given
the structure of the modern Internet, anyone who has a link
with an MTU less than 1500 isn't really concerned about
performance, anyway (i.e. they're a dialup link), so perhaps
we don't care if they fragment (this is a rationalization that
seems reasonably plausible).
>Audit-Trail:
>Unformatted: