NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/44508: [dM] ICMP_UNREACH_NEEDFRAG uses wrong mtu

>Number:         44508
>Category:       kern
>Synopsis:       [dM] ICMP_UNREACH_NEEDFRAG uses wrong mtu
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 03 16:35:00 +0000 2011
>Originator:     Mouse
>Release:        NetBSD 4.0.1
System: NetBSD VAIO-Frank.Rodents-Montreal.ORG 4.0.1 NetBSD 4.0.1 (VAIO-MP) #0: 
Wed Feb 2 21:56:02 EST 2011 
mouse%VAIO-Frank.Rodents-Montreal.ORG@localhost:/home/mouse/kbuild/VAIO-MP i386
Architecture: i386
Machine: i386
        When generating an ICMP_UNREACH_NEEDFRAG message, ip_input uses
        the target interface's MTU:

                        destmtu = ipforward_rt.ro_rt->rt_ifp->if_mtu;

        However, if the route's MTU is less than the interface's, this
        is the wrong MTU to use; the resulting ICMP message gives an
        MTU that will not actually work.

        This is a regression as compared to 1.4T.  I discovered this
        bug the hard way when moving my house gateway (which, for
        reasons not relevant here, has such a route) from 1.4T to
        4.0.1, and found things breaking.
        Configure a 4.0.1 machine as a router, with a route whose MTU
        is less than that of the target interface.  Try to send a
        packet with DF set through it; notice the MTU in the ICMP is

        My test setup when working on the fix for this involved two
        machines, A, with ex0 and vr0, and B, with re0 and cue0.  A ex0
        and B re0 are on the same switch; A vr0 and B cue0 are
        connected with a crossover cable.  (It's possible B's cue0 is
        unnecessary; I used it as a convenient way to get carrier on
        A's vr0.  Another switch would have worked as well, but cue0
        was handier - and, with a little ARP hackery, not shown below,
        cue0 can also be used to snoop the packets A wants to send on

        Machine A:
        # ifconfig ex0
        # ifconfig vr0
        # route add -host
        # route add -host -mtu 1400
        # sysctl -w net.inet.ip.forwarding=1
        # tcpdump -n -s 2000 -p -i ex0 icmp

        Machine B:
        # ifconfig re0
        # ifconfig cue0 up
        # route add -net -netmask
        # ping -D -s 1472 -n -c 1
        # ping -D -s 1472 -n -c 1

        Note that the tcpdump on A sees the echo request and nothing
        more for the first ping, but sees a need-frag ICMP for the
        second.  With the bug, the MTU in the ICMP is 1500; when A is
        running a kernel with the fix below, it's 1400.
        Rather than duplicate the MTU logic from ip_output in ip_input,
        we can just use IP_RETURNMTU to have ip_output tell us what the
        necessary MTU is.  (Not that the logic is
        But having ip_output tell us what it decided it needs is more
        reliable than trusting two semantically equivalent pieces of
        code to stay in sync if/when their common task becoems more

        --- OLD/sys/netinet/ip_input.c  2008-02-14 21:03:51.000000000 -0500
        +++ NEW/sys/netinet/ip_input.c  2011-02-02 21:55:25.000000000 -0500
        @@ -1843,6 +1843,7 @@
                int error, type = 0, code = 0, destmtu = 0;
                struct mbuf *mcopy;
                n_long dest;
        +       int rmtu;
                 * We are now in the output path.
        @@ -1934,9 +1935,10 @@
        +       rmtu = 0;
                error = ip_output(m, (struct mbuf *)0, &ipforward_rt,
        -           (IP_FORWARDING | (ip_directedbcast ? IP_ALLOWBROADCAST : 
        -           (struct ip_moptions *)NULL, (struct socket *)NULL);
        +           (IP_FORWARDING | IP_RETURNMTU | (ip_directedbcast ? 
        +           (struct ip_moptions *)NULL, (struct socket *)NULL, &rmtu);
                if (error)
        @@ -1997,7 +1999,7 @@
        -                       destmtu = ipforward_rt.ro_rt->rt_ifp->if_mtu;
        +                       destmtu = rmtu ? : 
         #if defined(IPSEC) || defined(FAST_IPSEC)
                                if (sp != NULL) {
                                        /* count IPsec header size */

        "It works for me."

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML      
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Home | Main Index | Thread Index | Old Index