Subject: Re: Networking question MTU on non-local nets
To: None <port-macppc@netbsd.org>
From: Donald Lee <MacPPC@caution.icompute.com>
List: port-macppc
Date: 06/15/2003 09:33:33
Recently, der mouse cogently squeaked:
>[Sparkle] 864> traceroute -P pddf654.tkyoac00.ap.so-net.ne.jp
>traceroute to pddf654.tkyoac00.ap.so-net.ne.jp (218.221.246.84), 30 hops max, 17914 byte packets
>message too big, trying new MTU = 1500
> 1 Stone (216.46.5.9) 7.167 ms 5.946 ms *
> 2 core-04.openface.ca (216.46.14.121) 54.841 ms 52.423 ms 52.056 ms
> 3 bob.openface.ca (216.46.1.1) 51.988 ms 51.767 ms 51.595 ms
> 4 doug.openface.ca (216.46.1.16) 52.112 ms 52.349 ms 57.553 ms
> 5 border-peer1.openface.ca (216.46.0.245) 154.647 ms 54.558 ms 57.028 ms
> 6 openface-gw.peer1.net (65.39.144.129) 53.590 ms 54.071 ms 53.243 ms
> 7 Gig4-0.mtl-gsr-a.peer1.net (216.187.90.229) 54.545 ms 68.725 ms 54.515 ms
> 8 OC48POS0-0.nyc-gsr-b.peer1.net (216.187.123.234) 62.799 ms 77.577 ms 63.227 ms
> 9 GIG1-0.wdc-gsr-a.peer1.net (216.187.123.226) 67.580 ms 68.829 ms 68.616 ms
>10 ge-2-3-0.r02.asbnva01.us.bb.verio.net (206.223.115.112) 68.804 ms 67.953 ms 70.125 ms
>11 p16-0-1-2.r21.asbnva01.us.bb.verio.net (129.250.2.62) 74.460 ms 69.896 ms 69.524 ms
>12 p16-5-0-0.r01.mclnva02.us.bb.verio.net (129.250.2.180) 69.997 ms 70.349 ms 72.354 ms
>13 p16-7-0-0.r02.mclnva02.us.bb.verio.net (129.250.5.10) 71.260 ms 71.407 ms 70.120 ms
>14 p16-0-1-2.r20.plalca01.us.bb.verio.net (129.250.2.192) 127.981 ms 128.502 ms 129.232 ms
>15 xe-0-2-0.r21.plalca01.us.bb.verio.net (129.250.4.231) 128.421 ms 127.746 ms 144.905 ms
>16 p64-0-0-0.r21.snjsca01.us.bb.verio.net (129.250.5.49) 128.569 ms 137.866 ms 128.656 ms
>17 p16-1-1-0.r82.mlpsca01.us.bb.verio.net (129.250.3.195) 128.506 ms 129.103 ms 128.873 ms
>18 p16-0-2-0.r21.tokyjp01.jp.bb.verio.net (129.250.4.158) 243.949 ms 245.087 ms 244.665 ms
>19 xe-1-1-0.r20.tokyjp01.jp.bb.verio.net (129.250.3.233) 243.093 ms 242.310 ms 242.633 ms
>20 ge-3-0-0.a10.tokyjp01.jp.ra.verio.net (61.213.162.76) 230.151 ms 229.910 ms 230.652 ms
>21 61.120.146.230 (61.120.146.230) 230.672 ms ge-3-0-0.a10.tokyjp01.jp.ra.verio.net (61.213.162.76) 243.620 ms 241.961 ms
>22 61.120.146.230 (61.120.146.230) 242.440 ms 242.022 ms note-13Gi0-0-0.net.so-net.ne.jp (61.211.63.133) 230.177 ms
>23 61.211.63.247 (61.211.63.247) 232.482 ms 232.314 ms 234.189 ms
>24 61.211.63.247 (61.211.63.247) 232.197 ms 231.845 ms 233.382 ms
>25 61.211.63.247 (61.211.63.247) 234.610 ms
>fragmentation required and DF set, next hop MTU = 1454
>25 pddf654.tkyoac00.ap.so-net.ne.jp (218.221.246.84) 269.161 ms 269.644 ms 269.784 ms
>[Sparkle] 865>
>
>If this is to be believed, the low-MTU link is the very last hop. I
>really wonder what's with hops 21/22 and 23/24; the way different
>gateways respond on lines 21 and 22, it appears there is some kind of
>variant routing going on - loadsharing, maybe.
>
>>> A more detailed description of the client-side network might help:
>>> where is the low-MTU link, what hardware is on each end of it, where
>>> is the NAT being done, what speeds are the various pieces running
>>> at, that sort of thing.
>> As you can see above, getting that sort of data would be non-trivial.
>> It's tough enough getting the gentleman in Japan to send us the bits
>> of data he has.
>
>:-( I misunderstood; I thought you actually had control over both ends
>of the test connection.
>
>Because my traceroute -P worked, I feel confident that the ICMP
>unreachables necessary to drive PMTU-D are making it out from the
>Japanese end of things. But it does look to me as though something is
>broken on the client side; some but not all of the second frags making
>it through - but all the first frags working - practically guarantees
>that there is something wrong between the fragmentation point and the
>endpoint; since the fragmentation point is right next to the endpoint
>per my traceroute above, this means it's on that end.
I have a feeling that the endpoint is some sort of dynamic IP, too.
When I did some ping tests to this endpoint (ping -s xxx japan-guy), I
noticed that packets over 1400 bytes were getting through. This
suggests that the fragmentation at ~1K was/is "transient".
>[snip]
>If it would help you, I can set up a machine deliberately behind a
>low-MTU link we can run experiments with. (If you want to take me up
>on that, off-list is probably best.)
Thank you. That's very generous.
I've been watching my server, now that
I've enabled PMTU-D, and I've noticed two things. One, the machine is
still up and performing reasonably. (good) Two, when I look at the traffic,
and watch specifically for icmp "must frag" messages, they happen, but
are pretty rare. I see a small number of them per hour on this server.
(The server gets roughly 1.5 million http requests per month)
I am satisfied at this point that
PMTU-D is safe and effective, so unless I encounter some major
badness, I plan to leave things as they are.
Many thanks to you and Manuel for your insights.
-dgl-