NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/48472: NetBSD may send ICMP_UNREACH_NEEDFRAG with "next MTU" larger than packet that caused it to be sent



>Number:         48472
>Category:       kern
>Synopsis:       NetBSD may send ICMP_UNREACH_NEEDFRAG with "next MTU" larger 
>than packet that caused it to be sent
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Dec 23 04:45:00 +0000 2013
>Originator:     Dave Huang
>Release:        NetBSD 6.99.17
>Organization:
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: khym%azeotrope.org@localhost |  they raise a paw / the bat, the cat /
FurryMUCK: Dahan         |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 38 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++
>Environment:
        
        
System: NetBSD foxy.azeotrope.org 6.99.28 NetBSD 6.99.28 (FOXY) #20: Fri Dec 20 
00:04:49 CST 2013  
khym%vmbsd.azeotrope.org@localhost:/usr/obj.i386/sys/arch/i386/compile/FOXY i386
Architecture: i386
Machine: i386
>Description:
        If you add a route to a destination and override the MTU,
setting it smaller than the interface MTU, NetBSD will use that MTU
when fragmenting packets that are routed through it to that
destination. And if the packet has the Don't Fragment bit set, NetBSD
will drop the packet and send an ICMP destination unreachable,
fragmentation needed and DF set. However, the next-hop MTU that's sent
in that ICMP packet is the interface MTU, rather than the smaller
route MTU, which breaks path MTU discovery.

There's discussion about this at a thread starting at
<http://mail-index.netbsd.org/tech-net/2013/12/19/msg004418.html>.
There's some disagreement about whether route MTUs should even be used
when forwarding packets; my understanding is that the concern stems
from PMTU discovery possibly using the routing table as a PMTU cache.
I agree that the PMTU cache should not affect packets being forwarded
through the router from another host, but do think that if the system
admin manually adds a route with a smaller MTU, that MTU should be
honored when routing packets. However, it's agreed that it's wrong for
NetBSD to drop a packet because it's bigger than the MTU, but give an
MTU larger than the packet in its ICMP fragmentation needed packet.

>How-To-Repeat:
On a NetBSD machine acting as a router:
# route add www.netbsd.org $my_gateway_ip -mtu 1200

(replacing $my_gateway_ip with the correct next hop gateway)

Then on another machine that routes through the above router,
$ ping -Ds 1300 www.netbsd.org
PING www.netbsd.org (149.20.53.86): 1300 data bytes
36 bytes from foxy.azeotrope.org (10.1.1.67): frag needed and DF set.  Next 
MTU=1500 for icmp_seq=0

Note that Next MTU=1500, even though the packet sent is smaller than
1500. Next MTU should be 1200.

Linux does use the route MTU when routing and returns the route MTU in
the ICMP fragmentation needed packet. Tested on Debian Linux, kernel
2.6.32-5-686, by running the following on the router, then doing the
above ping test from a machine that routes through it:
ip route add 149.20.53.86 dev eth0 mtu 1200

It appears from a comment in FreeBSD's ip_input.c:ip_forward() that it
sends the smaller of the interface MTU and the route MTU in its ICMP
fragmentation needed packet, but I haven't confirmed that.

>Fix:
I think this patch will at least make NetBSD consistent. It's already
using the route MTU to determine whether a packet needs to be
fragmented or not; this will make it return the actual MTU used in
that determination.

Index: netinet/ip_input.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/ip_input.c,v
retrieving revision 1.308
diff -u -r1.308 ip_input.c
--- netinet/ip_input.c  29 Jun 2013 21:06:58 -0000      1.308
+++ netinet/ip_input.c  20 Dec 2013 06:04:33 -0000
@@ -1335,7 +1335,8 @@
                code = ICMP_UNREACH_NEEDFRAG;
 
                if ((rt = rtcache_validate(&ipforward_rt)) != NULL)
-                       destmtu = rt->rt_ifp->if_mtu;
+                       destmtu = rt->rt_rmx.rmx_mtu ?
+                           rt->rt_rmx.rmx_mtu : rt->rt_ifp->if_mtu;
 #ifdef IPSEC
                (void)ipsec4_forward(mcopy, &destmtu);
 #endif

>Unformatted:
        
        


Home | Main Index | Thread Index | Old Index