Subject: kern/29529: ipfilter fastrouted packets corrupted
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <arto@selonen.org>
List: netbsd-bugs
Date: 02/25/2005 13:23:00
>Number:         29529
>Category:       kern
>Synopsis:       ipfilter fastrouted packets corrupted
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 25 13:23:00 +0000 2005
>Originator:     Arto Selonen
>Release:        NetBSD 2.99.16 (~20050224)
>Organization:
>Environment:
NetBSD blah 2.99.16 NetBSD 2.99.16 (BLAH) #12: Thu Feb 24 09:50:34 EET 2005  blah@blah:/obj/sys/arch/i386/compile/BLAH i386

>Description:
There is a NetBSD-current box with several wm(4) interfaces acting as a
gateway/firewall between the interfaces. Sources were updated on
20050224 from us2 anoncvs mirror. Before the upgrade it was running ipf413 -current, and fastrouted packets were affected by kern/27079.
Prior to kern/27079 the box was forwarding fastrouted packets without
problems.

A simplified view of the network setup with just two NICs:
(between which the problem is seen)

  PUB-NET <---> IPF-416-NBSD-C <---> IANA-NET

After the latest upgrade to ipf-416, fastrouted packets seem to get
corrupted in a different manner. Here is a packet seen by tcpdump ran on
an old linux laptop in IANA-NET, after the packet has been fastrouted
through the box (the laptop was the target of a SSH connection
attept made from some PUB side desktop system):

13:58:22.658571 truncated-ip - 15300 bytes missing!client.example.com.52680 > 192.168.242.231.ssh:
3796056737:3796072037(15300) win 32768 <mss 1460,
nop,wscale 0,nop,nop,timestamp 84 0>                                                       

On Feb 19th, similar problems were reported on current-users:
http://mail-index.netbsd.org/current-users/2005/02/19/0012.html

Here is the relevant ipf.conf rule for the initial TCP/SYN packet
for the public network side (wm0):

pass in log first quick on wm0 to wm2 proto tcp from pub.net/24 to any
flags S keep state group 10101                                                        

And here are the relevant ipnat.conf rules for traffic relating to wm0
or wm2 interfaces:

map wm0 0/0 -> public.ip/32 proxy port ftp ftp/tcp                              
map wm0 192.168.242.0/24 -> public.ip/32 portmap tcp/udp 1025:65000             
map wm0 192.168.242.0/24 -> public.ip/32                                        

Also, ipmon logged the following:
(note that the clock may be off in the previous tcpdump output)

24/02/2005 13:48:49.446255 STATE:NEW ip.pub.net,52680 ->
192.168.242.231,22 PR tcp
24/02/2005 13:48:49.446257 wm0 @10101:1 p ip.pub.net,52680 ->
192.168.242.231,22 PR tcp len 20 60 -S K-S IN

TCP connections work OK from IANA-side to public side.
NAT seems to be working OK, at least for simple iana->pub connections.
However, there is another problem that could be related to this PR:
FTP connections from IANA-side to Internet cause a panic on another
NetBSD-current box acting as a gateway/firewall between Internet and
the above mentioned public network. An expanded network setup:

internet <--> ipf-413-NBSD-C <--> pub-net <--> ipf-416-NBSD-C <--> iana

So, ipf-413 box panics on some FTP traffic that is related
to an FTP connection made from iana through ipf-416 and ipf-413
boxes to some FTP server in Internet. So far, we have not been able
to reproduce the panic reliably, so exact details are not known.

In case it may have something to do with this PR, here are just the
function calls from a backtrace in the ipf-413 box after a panic:

db> tr
cpu_Debugger
m_copydata
ippr_ftp_in
appr_check
fr_natin
fr_checknatin
fr_check
fr_check_wrapper
pfil_run_hooks
ip_input
ipintr

If somebody thinks those panics warrant a PR of their own, please let
me know. I'm reporting it here, as the cause may be ipf-416 related,
and the bug is in ipf-413, so it may not be present in ipf-416 any
more (and so nobody might be interested in fixing it anyway). Due to
this PR, I am not too keen on just upgrading that ipf-413 box, as
there is obviously something wrong with ipf-416 (at least this PR),
and that is in some way related to corrupting forwarded packets.

I haven't had much time to debug the fastroute problem yet, but I'll
try to collect more information on it next week. I am also happy to
produce any test case results, additional debug data, configuration
information etc. that might help in fixing this problem (or either
of these problems: panic on ipf-413 or fastroute corruption on ipf-416).
I can also test patches against NetBSD-current sources.

>How-To-Repeat:
Hopefully, it is enough to setup a box with two interfaces and
a fastroute ipfilter rule for those interfaces. Then try to make
a TCP connection through the box using the fastroute rule.
Observe corrupted packets on receiving end.

If this can not be easily reproduced, I am willing to try to produce
any test/debug data needed, when possible.
>Fix: