tech-net: MTU questions

Subject: MTU questions
To: None <tech-net@netbsd.org>
From: Hal Snyder <hal@vailsys.com>
List: tech-net
Date: 08/01/2001 03:33:59
We are phasing in BGP multihoming with NetBSD on our border routers.
Recently, during routing tests, one of our web servers began to
stall on transfers to the outside.

Apparently we had a case of MTU blackholing. Our situation became
much clearer after reading rfc2923, section 2.1, btw.

Two questions follow. Mainly, I'd like to know are these bugs,
or normal operation that just came as a surprise.


Here is what happened:

1. Normal operation - border router connects to Internet via
fxp1. There is a single routing rule on that interface - static
default to the ISP peer. MTU on interface and routing rule are
of course both 1500.

2. Route around simulated outage - change static default route to
remote address for gif0; route acquires an MTU of 1280 from that
interface.

3. Simulated outage is over - change static default route back to
ISP peer reached from fxp1.  At this point, interface MTU is 1500
but route MTU is 1280.

Question #1: is automatically adjusting route MTU down on route change
but not up a bug, or a feature?

4. Tcpdump on the web server shows that, during stalled transfers, the
ICMP need-to-frag is giving an MTU of 1500 (the interface MTU) and not
1280 (the route MTU). So packets sent by the web server never get
short enough to make it past the default route.

02:30:42.536379 web.client.3304 > web.server.80: S 2896394281:2896394281(0) win 16384 <mss 1460,nop,wscale 0,nop
,nop,timestamp 97257233 0> (DF)
02:30:42.536454 web.server.80 > web.client.3304: S 2057701234:2057701234(0) ack 2896394282 win 17520 <mss 1460> 
(DF)
02:30:42.537823 web.client.3304 > web.server.80: . ack 1 win 17520 (DF)
02:30:42.539207 web.client.3304 > web.server.80: P 1:266(265) ack 1 win 17520 (DF)
02:30:42.539840 web.server.80 > web.client.3304: . 1:1461(1460) ack 266 win 17520 (DF)
02:30:42.539860 web.server.80 > web.client.3304: . 1461:2921(1460) ack 266 win 17520 (DF)
02:30:42.540232 router.inside > web.server: icmp: web.client unreachable - need to frag (mtu 1500)
02:30:42.540338 router.inside > web.server: icmp: web.client unreachable - need to frag (mtu 1500)
...

Question #2: isn't it a bug to send the interface MTU and not the route MTU
in the ICMP need-to-frag, when the latter is causing packets to be dropped?

Thanks.