Subject: kern/32874: pf(4)'s route-to feature is not working properly, checksum errors
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <ndehne@gmail.com>
List: netbsd-bugs
Date: 02/19/2006 07:20:00
>Number:         32874
>Category:       kern
>Synopsis:       pf(4)'s route-to feature is not working properly, checksum errors
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 19 07:20:00 +0000 2006
>Originator:     Nino Dehne
>Release:        
>Organization:
>Environment:
NetBSD [...] 3.99.15 NetBSD 3.99.15 (WRAP) #0: Sat Feb 18 22:26:43 CET 2006  [...]:/var/tmp/wrap/HEAD/obj/sys/arch/i3
86/compile/WRAP i386

Problem exists on recent 3.0_STABLE as well.
>Description:
I have a router doing NAT and routing some IPv6 networks. It looks like this:
            |<------------ router --------->|
             pppoe0-tap0\               sip0 (unused)
             pppoe1-tap1--bridge0       sip1-----------LAN
             pppoe2-tap2//
DSL-----------------sip2/         gif0

The bridge(4)+tap(4) setup is necessary in order to be able to use multiple PPPoE accounts here, see http://blog.gmane.org/gmane.os.netbsd.devel.network/day=20050627.

pppoe0 carries a dynamic IPv4 address and the IPv4 default route as well as a gif0 tunnel with IPv6 net A and the IPv6 default route.

pppoe1 has a static IPv4 address.

pppoe2 has a native IPv6 network B.

In order to properly route outgoing traffic from the static IPv4 address as well as IPv6 network B, I'm using pf(4)'s route-to feature with a rule set that looks like this:

ext4_dyn_if = "pppoe0"
ext4_fix_if = "pppoe1"
ext6_if     = "pppoe2"
ext6_net    = "<IPv6 network B>"
ext6_def_if = "gif0"

# disabling scrubbing doesn't make the problem go away
scrub in all
scrub out all random-id max-mss 1452

pass out quick on $ext4_dyn_if route-to $ext4_fix_if inet from ($ext4_fix_if) to any
pass out quick on $ext6_def_if route-to $ext6_if inet6 from $ext6_net to any

Apart from some RFC1918 blocking rules and nat + rdr, the rule set is empty and default-pass.

This works for the IPv6 network B, i.e. traffic from that network is successfully routed out over pppoe2. However, outgoing IPv4 packets from the static address give me checksum errors, no matter what protocol I use (ICMP, UDP, TCP). I used Ethereal between the DSL modem and pppoe1 to visualize the problem.

Notice the lines

0030  00 00 22 eb f6 43 c1 9d 02 00 08 09 0a 0b 0c 0d
                                    ^^ ^^
vs.

0030  00 00 22 eb f6 43 c1 9d 02 00 00 00 0a 0b 0c 0d
                                    ^^ ^^

which I guess should otherwise look identical. For echo requests, these bytes 08 09 are always 00 00 in the reply. So I guess it's not a hardware fault that randomly toggles bits.

No.     Time        Source                Destination           Protocol Info
   1135 9.736524    <external host>       <static address>           ICMP     Echo (ping) request

Frame 1135 (106 bytes on wire, 106 bytes captured)
Ethernet II, Src: [...] ([...]), Dst: [...] ([...])
PPP-over-Ethernet Session
Point-to-Point Protocol
Internet Protocol, Src: <external host> (<external host>), Dst: <static address> (<static address>)
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0 
    Checksum: 0xc80b [correct]
    Identifier: 0x6824
    Sequence number: 0x0000
    Data (56 bytes)

0000  xx xx xx xx xx xx xx xx xx xx xx xx 88 64 11 00   .........A.(.d..
0010  1a e3 00 56 00 21 45 00 00 54 74 a0 00 00 36 01   ...V.!E..Tt...6.
0020  bc 48 xx xx xx xx xx xx xx xx 08 00 c8 0b 68 24   .H....U.`(....h$
0030  00 00 22 eb f6 43 c1 9d 02 00 08 09 0a 0b 0c 0d   .."..C..........
0040  0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d   ................
0050  1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d   .. !"#$%&'()*+,-
0060  2e 2f 30 31 32 33 34 35 36 37                     ./01234567

No.     Time        Source                Destination           Protocol Info
   1136 9.736917    <static address>           <external host>        ICMP     Echo (ping) reply

Frame 1136 (106 bytes on wire, 106 bytes captured)
Ethernet II, Src: [...] ([...]), Dst: [...] ([...])
PPP-over-Ethernet Session
Point-to-Point Protocol
Internet Protocol, Src: <static address> (<static address>), Dst: <external host> (<external host>)
Internet Control Message Protocol
    Type: 0 (Echo (ping) reply)
    Code: 0 
    Checksum: 0xd00b [incorrect, should be 0xd814]
    Identifier: 0x6824
    Sequence number: 0x0000
    Data (56 bytes)

0000  xx xx xx xx xx xx xx xx xx xx xx xx 88 64 11 00   ...A.(.......d..
0010  1a e3 00 56 00 21 45 00 00 54 a6 41 00 00 ff 01   ...V.!E..T.A....
0020  c1 a6 xx xx xx xx xx xx xx xx 00 00 d0 0b 68 24   ..U.`(........h$
0030  00 00 22 eb f6 43 c1 9d 02 00 00 00 0a 0b 0c 0d   .."..C..........
0040  0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d   ................
0050  1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d   .. !"#$%&'()*+,-
0060  2e 2f 30 31 32 33 34 35 36 37                     ./01234567


I also noticed that, when I leave the IPv4 route-to line enabled, sooner or later the box will panic with

   panic: m_copydata: m == 0, len 12

with varying values for len. This line was from 3.0_STABLE. I believe it read m == NULL for 3.99.15. I'm only guessing that this is related, since the box has been stable so far if I disable the IPv4 route-to line in pf.conf.

Somebody help please, I've been pulling my hair out over this one!

The issue was also brought up in http://mail-index.netbsd.org/tech-net/2006/01/03/0000.html without any reaction.

If you need more information, mail me please.

Best regards,

ND
>How-To-Repeat:
Have two IPv4 WAN addresses. Route outgoing traffic from the non-default address to the proper interface with pf(4):

pass out quick on <default if> route-to <other if> inet from <second address> to any

Outgoing packets will reproducably have wrong checksums.
>Fix: