Subject: IPv6 PMTUD broken
To: None <current-users@NetBSD.org>
From: Ronald van der Pol <Ronald.vanderPol@rvdp.org>
List: current-users
Date: 06/02/2004 17:24:50
It looks like IPv6 PMTUD is not working correctly. This is on
NetBSD 2.0F i386. But it also happens on -stable.

Setup:
                           PPTP
bones ----------- wormhole --- ADSL modem in bridge mode --------- einstein
                  (router)

So, the IPv6 path is bones - wormhole - einstein.
wormhole is a Soekris with NetBSD 1.6.2

bones: 2001:888:1777:0:260:8ff:fed1:a403
wormhole: 2001:888:1777::1 (fe80::220:afff:fec6:4faa)
einstein: 2001:7b8:206:1:240:f4ff:fe37:8232

bones runs a nameserver that sends a large UDP packet to einstein.

wormhole sends icmp6 too big 1280 to bones

bones updates routing table with einstein route with mtu=1280

bones keeps sending packets with mtu=1500

I have done some kernel debugging and it looks like the mtu in the
route object is zero. It is then set to the link mtu of 1500.

This is  the routing table entry on bones (netstat -nr):
2001:7b8:206:1:240:f4ff:fe37:8232  fe80::220:afff:fec6:4faa%ex0   UGHD        0        0   1280  ex0

This is the tcpdump on bones:

15:29:50.278947 2001:7b8:206:1:240:f4ff:fe37:8232.49154 > 2001:888:1777:0:260:8ff:fed1:a403.53: [udp sum ok]  43513+ [1au] AAAA? foo.rvdp.org. ar: . OPT UDPsize=4096 (41) [flowlabel 0x65043] (len 49, hlim 58)
15:29:50.294975 2001:888:1777:0:260:8ff:fed1:a403 > 2001:7b8:206:1:240:f4ff:fe37:8232: frag (0x92239c02:0|1448) 53 > 49154:  43513 q: AAAA? foo.rvdp.org. 128/2/3 foo.rvdp.org. AAAA dead:beef::, foo.rvdp.org. AAAA dead:beef::1, foo.rvdp.org. AAAA dead:beef::2, .... (large output)
15:29:50.295130 2001:888:1777:0:260:8ff:fed1:a403 > 2001:7b8:206:1:240:f4ff:fe37:8232: frag (0x92239c02:1448|1448) (len 1456, hlim 64)
15:29:50.295151 2001:888:1777:0:260:8ff:fed1:a403 > 2001:7b8:206:1:240:f4ff:fe37:8232: frag (0x92239c02:2896|831) (len 839, hlim 64)
15:29:50.299448 2001:888:1777::1 > 2001:888:1777:0:260:8ff:fed1:a403: icmp6: too big 1280 (len 1240, hlim 64)
15:29:50.300493 2001:888:1777::1 > 2001:888:1777:0:260:8ff:fed1:a403: icmp6: too big 1280 (len 1240, hlim 64)
15:29:55.297112 2001:7b8:206:1:240:f4ff:fe37:8232.49154 > 2001:888:1777:0:260:8ff:fed1:a403.53: [udp sum ok]  43513+ [1au] AAAA? foo.rvdp.org. ar: . OPT UDPsize=4096 (41) [flowlabel 0x65043] (len 49, hlim 58)
15:29:55.303223 fe80::260:8ff:fed1:a403 > fe80::220:afff:fec6:4faa: icmp6: neighbor sol: who has fe80::220:afff:fec6:4faa(src lladdr: 00:60:08:d1:a4:03) (len 32, hlim 255)
15:29:55.303931 fe80::220:afff:fec6:4faa > fe80::260:8ff:fed1:a403: icmp6: neighbor adv: tgt is fe80::220:afff:fec6:4faa(RS) (len 24, hlim 255)
15:29:55.319179 2001:888:1777:0:260:8ff:fed1:a403 > 2001:7b8:206:1:240:f4ff:fe37:8232: frag (0xc59bb0ba:0|1448) 53 > 49154:  43513 q: AAAA? foo.rvdp.org. 128/2/3 foo.rvdp.org. AAAA dead:beef::, foo.rvdp.org. AAAA dead:beef::1, foo.rvdp.org. AAAA dead:beef::2, .... (large output)
15:29:55.319331 2001:888:1777:0:260:8ff:fed1:a403 > 2001:7b8:206:1:240:f4ff:fe37:8232: frag (0xc59bb0ba:1448|1448) (len 1456, hlim 64)
15:29:55.319353 2001:888:1777:0:260:8ff:fed1:a403 > 2001:7b8:206:1:240:f4ff:fe37:8232: frag (0xc59bb0ba:2896|831) (len 839, hlim 64)
15:29:55.323713 2001:888:1777::1 > 2001:888:1777:0:260:8ff:fed1:a403: icmp6: too big 1280 (len 1240, hlim 64)
15:29:55.324758 2001:888:1777::1 > 2001:888:1777:0:260:8ff:fed1:a403: icmp6: too big 1280 (len 1240, hlim 64)
15:30:00.635584 2001:888:1777::1 > 2001:888:1777:0:260:8ff:fed1:a403: icmp6: neighbor sol: who has 2001:888:1777:0:260:8ff:fed1:a403(src lladdr: 00:20:af:c6:4f:aa) (len 32, hlim 255)
15:30:00.635767 fe80::220:afff:fec6:4faa > fe80::260:8ff:fed1:a403: icmp6: neighbor sol: who has fe80::260:8ff:fed1:a403(src lladdr: 00:20:af:c6:4f:aa) (len 32, hlim 255)
15:30:00.648981 2001:888:1777:0:260:8ff:fed1:a403 > 2001:888:1777::1: icmp6: neighbor adv: tgt is 2001:888:1777:0:260:8ff:fed1:a403(S) (len 24, hlim 255)
15:30:00.662144 fe80::260:8ff:fed1:a403 > fe80::220:afff:fec6:4faa: icmp6: neighbor adv: tgt is fe80::260:8ff:fed1:a403(S) (len 24, hlim 255)
15:30:05.654041 2001:888:1777:0:260:8ff:fed1:a403 > 2001:888:1777::1: icmp6: neighbor sol: who has 2001:888:1777::1(src lladdr: 00:60:08:d1:a4:03) (len 32, hlim 255)
15:30:05.654727 2001:888:1777::1 > 2001:888:1777:0:260:8ff:fed1:a403: icmp6: neighbor adv: tgt is 2001:888:1777::1(RS) (len 24, hlim 255)
^C

This is at the start of ip6_output():

Breakpoint 9, ip6_output (m0=0xc0a94400, opt=0xc4e5ed54, ro=0xc0a92474, 
    flags=0, im6o=0x0, so=0xc0a8c7d0, ifpp=0x0)
    at /export/NetBSD-current/src/sys/netinet6/ip6_output.c:183
183             bzero(&exthdrs, sizeof(exthdrs));
(gdb) print/x *ro 
$74 = {ro_rt = 0xc0a8eb58, ro_dst = {sin6_len = 0x1c, sin6_family = 0x18, 
    sin6_port = 0x0, sin6_flowinfo = 0x0, sin6_addr = {__u6_addr = {
        __u6_addr8 = {0x20, 0x1, 0x7, 0xb8, 0x2, 0x6, 0x0, 0x1, 0x2, 0x40, 
          0xf4, 0xff, 0xfe, 0x37, 0x82, 0x32}, __u6_addr16 = {0x120, 0xb807, 
          0x602, 0x100, 0x4002, 0xfff4, 0x37fe, 0x3282}, __u6_addr32 = {
          0xb8070120, 0x1000602, 0xfff44002, 0x328237fe}}}, 
    sin6_scope_id = 0x0}}
(gdb) print/x *ro->ro_rt
$75 = {rt_nodes = {{rn_mklist = 0xc09301c0, rn_p = 0xc0a8e948, rn_b = 0xffff, 
      rn_bmask = 0x0, rn_flags = 0x4, rn_u = {rn_leaf = {rn_Key = 0xc0a93d40, 
          rn_Mask = 0xc09a9880, rn_Dupedkey = 0x0}, rn_node = {
          rn_Off = 0xc0a93d40, rn_L = 0xc09a9880, rn_R = 0x0}}}, {
      rn_mklist = 0x0, rn_p = 0x0, rn_b = 0x0, rn_bmask = 0x0, rn_flags = 0x0, 
      rn_u = {rn_leaf = {rn_Key = 0x0, rn_Mask = 0x0, rn_Dupedkey = 0x0}, 
        rn_node = {rn_Off = 0x0, rn_L = 0x0, rn_R = 0x0}}}}, 
  rt_gateway = 0xc0a93d5c, rt_flags = 0x3, rt_refcnt = 0x1, rt_use = 0x34, 
  rt_ifp = 0xc09ee038, rt_ifa = 0xc0a1a100, rt_genmask = 0x0, rt_llinfo = 0x0, 
  rt_rmx = {rmx_locks = 0x0, rmx_mtu = 0x0, rmx_hopcount = 0x0, 
    rmx_expire = 0x0, rmx_recvpipe = 0x0, rmx_sendpipe = 0x0, 
    rmx_ssthresh = 0x0, rmx_rtt = 0x0, rmx_rttvar = 0x0, rmx_pksent = 0x0}, 
  rt_gwroute = 0xc0a8ebdc, rt_timer = {lh_first = 0x0}, rt_parent = 0x0}
(gdb)

I think ro is passed to ip6_getpmtu(). In lines 1277 to 1279 the mtu is
set to ifmtu.

I don't have enough knowlegde of the networking code to know how this logic
is supposed to work.

	rvdp