Subject: Re: Networking question MTU on non-local nets
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Donald Lee <MacPPC@caution.icompute.com>
List: port-macppc
Date: 06/14/2003 16:40:33
I really love this list.  Thanks, guys!

At 4:23 PM -0400 6/14/03, der Mouse wrote:
>> Unless you have PMTU turned on, using an ethernet MTU out "in the
>> internet" is not safe.
>
>Why not?  For what value of 'safe"?

My bad.  I've been doing some reading.  I had the mistaken impression that
fragmentation of TCP didn't work at all.

The problem I have apparently requires four things to cause _badness_:

	1. Run NetBSD 1.5.2 - which (apparently) does not drop back to
	small (~512) MTU on non-local connections with PMTU-D off.
	(as I believe it should)

	2. PMTU-D off.

	3. Some link between the endpoints having a small-ish MTU.

	4. One of the routers' (or endpoint's) fragmentation/re-assembly
	is busted.

What I was seeing in the remote tcpdump was: (caution: long lines)

22:50:05.864796 192.168.0.38.49815 > mercy.icompute.com.http: S 1026952240:1026952240(0) win 32768 <mss 1460,nop,wscale 0,nop,nop,timestamp 707979176 0> (DF)
22:50:06.123198 mercy.icompute.com.http > 192.168.0.38.49815: S 2847775046:2847775046(0) ack 1026952241 win 16384 <mss 1414,nop,wscale 0,nop,nop,timestamp 17981721 707979176>
22:50:06.123306 192.168.0.38.49815 > mercy.icompute.com.http: . ack 1 win 33648 <nop,nop,timestamp 707979176 17981721> (DF)
22:50:06.127597 192.168.0.38.49815 > mercy.icompute.com.http: P 1:225(224) ack 1 win 33648 <nop,nop,timestamp 707979176 17981721> (DF)
22:50:06.397694 mercy.icompute.com.http > 192.168.0.38.49815: . 1:993(992) ack 225 win 17520 <nop,nop,timestamp 17981722 707979176> (frag 8006:1024@0+)
22:50:06.397704 mercy.icompute.com > 192.168.0.38: (frag 8006:422@1024)
22:50:06.444745 192.168.0.38.49815 > mercy.icompute.com.http: . ack 1415 win 33648 <nop,nop,timestamp 707979177 17981722> (DF)
22:50:06.705762 mercy.icompute.com.http > 192.168.0.38.49815: . 1415:2407(992) ack 225 win 17520 <nop,nop,timestamp 17981722 707979176> (frag 8007:1024@0+)
22:50:06.718277 mercy.icompute.com.http > 192.168.0.38.49815: . 2829:3821(992) ack 225 win 17520 <nop,nop,timestamp 17981722 707979176> (frag 8008:1024@0+)
22:50:07.761468 mercy.icompute.com.http > 192.168.0.38.49815: . 1415:2407(992) ack 225 win 17520 <nop,nop,timestamp 17981724 707979176> (frag 8009:1024@0+)
22:50:10.761691 mercy.icompute.com.http > 192.168.0.38.49815: . 1415:2407(992) ack 225 win 17520 <nop,nop,timestamp 17981730 707979176> (frag 8010:1024@0+)

To my ignorant eyes, it looked like some of the fragmented packets were
not getting through.


> [snip]
>> If you hit a small packet router (i.e. PPPoE, VPN, etc) the
>> fragmented or oversized packets effectively get silently dropped.
>
>If they're sent with DF clear, routers are supposed to fragment as
>necessary.  If you have DF set, routers are supposed to send back an
>"unreachable - fragmentation needed but DF set" ICMP.
>
>Neither one is "silently dropped".  If you're seeing silent drops,
>something is broken somewhere.

Yup.  I understand that now.

>>>> I have also learned that MTU path discovery is an option, but this
>>>> is not on by default, and I am a little afraid of it.
>>> I have it enabled on all my servers, and I didn't notice problems.
>> Thanks.  I've turned it on, too.
>
>Be careful.  If you have any packet filtering, make sure it lets
>through the ICMPs I described above.  It's very common, in my
>experience, for webservers and mailhosts to have PMTU-D on but be
>behind something that apparently drops the ICMPs that drive PMTU-D.
>Back when I was behind a low-MTU link, I regularly saw hosts connecting
>to me and doing protocol until they wanted to send bulk data, at which
>poin tthe connectino locked up.  tcpdumping outside the low-MTU link (I
>was fortunate in that I had such access) revealed that I'd get a large
>packet, send back the ICMP, wait, get the same size packet, send back
>another ICMP, lather-rinse-repeat until the far end decides I've gone
>dead and gives up.
>
>I've sent out numerous emails about it, but the only case where I ever
>got anything fixed was one where I personally knew the sever's admin,
>and even then it took a good deal of tweaking and retesting with a
>parallel comm channel open between us.
>
>It's pretty close to the point where I'd say that such configs have
>broken things enough that the de-facto minimum MTU on the net is 1500.

It's pertty clear that my webserver - up till today - was sending out
packets in the 1400+ range to just about everyone, and I've had
very few complaints, so obviously you can get away with it _most_
of the time.

(Note: the 1.6.1 NetBSD kernel drops back to 500+ MTU for non-local
paths with PMTU-D off.)

Worst case, turning on PMTU-D should not *hurt* anything on my server,
as it should still work fine for all those connections that can handle
the larger packets.  The only place I get in trouble are those places
that were broken before **and** have broken routers and/or packet
filtering that causes PMTU-D to fail to function.  In either
case, I should be no worse off than I am today.

-dgl-