Subject: Re: kern/2274: NetBSD's MTU/MSS handling is rather broken
To: None <jhawk@MIT.EDU>
From: None <Havard.Eidnes@runit.sintef.no>
List: tech-net
Date: 04/03/1996 23:17:12
> This seems awfully inconsistent.

I do not have all the detailed references to BSD versions at
hand, but I think the following is relatively accurate:

This is the way TCP on BSD has been (almost?) since it's
inception.  I think that initially only systems on the local wire
were spoken to using the interface MTU, all other destinations
used a TCP MSS fitting within an MTU of 576 bytes.  Later this
was modified by introducing the "subnets are local" variable.
When it was set, all subnets of the local traditional classfull
network number was considered "local", where it was assumed that
a TCP MSS corresponding to the interface MTU could safely be used
to reach all systems.  For all other destinations a TCP MSS would
be used to fit in an MTU of 576 bytes -- this was to avoid
fragmentation.  I'm not positive where the 576 value comes from,
I seem to recall it being a "recommended minimum MTU", but I may
be wrong.

I agree that the realities of the world have now made this way of
choosing the TCP MSS "suboptimal".

I also agree that the correct fix is to implement Path MTU
Discovery.  (Oh, BTW, it's use should be tweakable via sysctl(),
as you can't always rely on Path MTU Discovery to work properly).

Just blindly using 1460 as the TCP MSS in every direction is a
really awful solution, as that will in many cases cause
fragmentation, which is bad.

As for why fragmentation is bad, I seem to remember the title of
a paper called "Fragmentation Considered Harmful" (written by
Jeff Mogul?), and I think the argument goes something like this:
If you have precious little buffer space on an outgoing interface
leading out to a link with a small MTU (ok, granted, this is a
"near-pathological" situation), a single IP "jumbogram" will have
to be fragmented.  However, consider what happens if not all the
fragments find room in the output queue: some fragments of the
jumbogram will consistently be dropped, and since there is no
retransmission for the individual fragments of an IP datagram,
the jumbogram will never correctly reassemble at the final
destination.

- Havard