Subject: kern/25087: ipfilter 4.1 NAT seems to cause fragmentation problems (probably related to gif)
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <arto@selonen.org>
List: netbsd-bugs
Date: 04/07/2004 13:03:48
>Number:         25087
>Category:       kern
>Synopsis:       ipfilter 4.1 NAT seems to cause fragmentation problems (probably related to gif)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Apr 07 13:04:01 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Arto Selonen
>Release:        NetBSD-current from ~March 29th, 2004
>Organization:
>Environment:
NetBSD blah 1.6ZL NetBSD 1.6ZL (BLAH) #30: Mon Mar 29 10:07:04 EEST 2004  blah@blah:/obj/sys/arch/i386/compile/BLAH i386
>Description:
These have been picked from the thread starting at 
http://mail-index.netbsd.org/current-users/2004/03/29/0015.html
and edited for hopefully better readability. I understand that there may well be information that I've forgotten, so I am more than happy
to provide any additional information that might be needed, including
testing patches (although with relatively slow schedule; I would not
like to reboot the problem system too often) and/or making test runs, etc.

Here is the network setup:
S<->plain<->(ex0)GWS(ep0)<->IPSEC<->(fxp0)GWD(fxp1)<->plain<->(fxp0)D

S:  source host, could be anything but in this case it is Windows, and
    is connected to 196.168.x/24 network with ex0 of GWS
GWS: NetBSD-current from ~Dec04; default gateway for S
    - ep0 connected to Internet
    - has the following gif-interface:
	tunnel inet GWS.e.p.0 --> G.W.D.0
	inet 10.0.0.2 -> 10.0.0.1 netmask 0xfffffffc
    - static route: 10.0.0.1 as gateway to fxp1@GWD
    - NAT rules for hiding 192.168/16 on ep0
      (only for Internet access; should not be used here)
    - transport mode IPSEC for GWS/GWD using ESP+AH
GWD: NetBSD-current upgraded from ~Feb25 to ~Mar29
    - default gateway for D
    - public IP on both fxp
    - has the following gif-interface:
	tunnel inet G.W.D.0 --> GWS.e.p.0
	inet 10.0.0.1 -> 10.0.0.2 netmask 0xfffffffc
   - static route: 10.0.0.2 as gateway to ex0@GWS
   - NAT rules for hiding 192.168/16 and 10/8 on fxp1
   - transport mode IPSEC for GWD/GWS using ESP+AH
D: destination host; NetBSD-current from ~Mar01
   - tested to other destination systems as well (Tru64,Linux)
   - using public IP address

And then the problem description:

When making TCP connections (tested with SSH & HTTP) from S to D the connection is established, but should D send packets that are close to MTU, then they never reach S, and the connection is effectively dead.
Running tcpdump on GWD's fxp1 interface shows that about 7 seconds after such a "large" packet is received on fxp1, GWD will send D
the following ICMP: "GWD > D: icmp: ip reassembly time exceeded".
This doesn't seem to have much effect on D, and so D is no longer
able to send to S through GWD.

I tested also with similar TCP connections from GWS to D, and in those
cases GWD responded to D with the following ICMP:
"GWD > D: icmp: GWS unreachable - need to frag". After about 10 seconds D would leave DF out of the packets, and start retransmitting from the beginning. So, the connection would work normally after the initial delay.

Before GWD was upgraded to include ipfilter 4.1.1 it functioned normally, and no problems were detected with any of the above connection types. To eliminate possible components from this
I disabled IPSEC (but left the gif interfaces and routes in place), and the same problem was still observed. So, IPSEC was enabled again, and then NAT rules on fxp1@GWD regarding the reserved addresses 10/8 and 192.168/16 (that show up due to the IPSEC use) were removed. This "solved" the problem, and pointed to ipfilter.

Since this is the only setup that I have where connections are made
to target systems and source address is removed along the way with
ipfilter 4.1.1 NAT, I'm not sure whether this is a generic problem,
or if it is related to the gif-interface. I am not going to test that
scenario unless somebody specifically requests such data points, since
I now have a temporary workaround. I'd much prefer the NAT in place, though.

No problems have been observed with other NAT rules. I've used seemingly
successfully RDR rules for transparent web cache/proxy and redirecting some DNS traffic. Also, FTP proxy seems to be working. Also, NAT rules on GWS work nicely, and connections from S to Internet work without problems.
>How-To-Repeat:
I hope this could be triggered with the following more simple case:

A --> GW --> B

GW uses NAT on the interface connecting to B to hide address used
by A replacing it with it's own. Try to make TCP connections from
A to B, and try to get enough data back so that B would start using
~MTU size packets (ls -lR / seems to be good for SSH tests).
Observe traffic between GW and B. If this triggers the problem, then
packets sent by B never reach A, and GW will send ICMP messages to B
to indicate fragmentation problems.

I'm not sure if this is enough to trigger the problem. If not, try to
repeat the whole setup described in Full Description. Or ask me for
further details, etc.

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted: