Subject: Re: IP-IP v4 tunnel?
To: Mike M. Volokhov <mishka@netbsd.org>
From: Peter Eisch <peter@boku.net>
List: netbsd-users
Date: 12/27/2006 22:10:37
On 12/27/06 10:16 AM, "Mike M. Volokhov" <mishka@netbsd.org> wrote:

> Peter Eisch <peter@boku.net> wrote:
> [snip]
>> 20:59:10.146020 IP (tos 0x0, ttl 115, id 57089, offset 0, flags [DF], \
>>      length: 1431) SVR.80 > CLIENT.40033: FP [tcp sum ok] 519:1910(1391) \
>>     ack 420 win 65116
>> 20:59:10.146067 IP (tos 0x0, ttl 255, id 46179, offset 0, flags [none], \
>>      length: 56) FW > SVR: icmp 36: CLIENT unreachable - need to frag for \
>>     IP  (tos 0x0, ttl 114, id 57089, offset 0, flags [DF], length: 1431, \
>>     bad cksum 99ca (->9aca)!) SVR.80 > CLIENT.40033: [|tcp]
> 
> This looks like a sort of problem - wrong checksumming leading to
> ICMP message rejected by <SVR>, and as result broken PMTUD.
> 

If I understand this right, the reason the gif0 tunnel doesn't work with an
mtu of 1280 

> Could you show output of the following command on both <SVR> and <FW>:
> 
>   netstat -s -f inet | \
> egrep -e '^[a-z]+:' -e 'packets (received|sent)$' -e '(frag|sum)'
> 
SRV isn't my system.  Assume it to be www.cisco.com, for example.  However
on FW I can report both BEFORE and AFTER trying to open that site.

BEFORE:
ip:
        480693906 total packets received
        177 bad header checksums
        166860 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped (out of ipqent)
        0 malformed fragments dropped
        349 fragments dropped after timeout
        1561472 output datagrams fragmented
        1561472 fragments created
        30278 datagrams that can't be fragmented
icmp:
        33 bad checksums
igmp:
        0 messages received with bad checksum
tcp:
        7955002 packets sent
        4692425 packets received
                0 discarded for bad checksums
udp:
        0 with bad checksum
ipsec:


AFTER:
ip:
        480710171 total packets received
        177 bad header checksums
        166886 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped (out of ipqent)
        0 malformed fragments dropped
        349 fragments dropped after timeout
        1561542 output datagrams fragmented
        1561542 fragments created
        30318 datagrams that can't be fragmented
icmp:
        33 bad checksums
igmp:
        0 messages received with bad checksum
tcp:
        7955082 packets sent
        4692441 packets received
                0 discarded for bad checksums
udp:
        0 with bad checksum
ipsec:


> Also, because you're using wm(4) try to work without hardware
> checksumming and fragmentation. The following command may do the
> trick:
> 
>   ifconfig wm0 -udp4csum -tcp4csum -ip4csum -tso4
> 

None of those features were enabled.  I've tried with them enabled as well
and the results are the same.

>> If I raise the mtu on the gif interfaces to 1500, _everything_ works great
>> at the application layer.  This obviously introduces fragments, but if it's
>> the only way that works I guess I can keep it.
> 
> Hm. As far as I understand you have configuration similar to:
> 
>   [CLIENT]---[FW]===[SRV]
> 

My config is:  [CLIENT]=====[FW]---[Internet [SRV]]

The CLIENT is also a 3.1.0_STABLE doing NAT to the gif0's IP for the actual
client -- as seen in the previous tcpdump and log excerpt.  The tunnel is
between CLIENT and FW in my situation.

> <FW> and <SRV> are connected through gif(4). <FW> and <CLIENT> are
> linked via wm0 on <FW>. Then:
> 
> 1) Where the packets fragmented? Exact interface on exact host, please...

The get fragmented when getting written to the interfaces that support the
gif0.  

> 2) Whom packets fragmented? I.e. who is src? And who is dst?

Whenever packets larger than the gif mtu (either end) actually get written
to the physical interface they get frag'd.  It takes a large mtu on the gif0
to get the packets to be written.

> 3) Are all the hosts NetBSD based?

Yes.  CLIENT system is 3.1.0_STABLE and FW is 3.0_STABLE (both most recently
patched kernels).

> 4) Is PMTUD enabled on both <CLIENT> and <SRV>? See ip.mtudisc and
>  tcp.mssdflt sysctls.

As this is probably only pertinent on FW:

FW# sysctl -a | grep 'ip.mtudisc'
net.inet.ip.mtudisc = 1
net.inet.ip.mtudisctimeout = 600
FW# sysctl -a | grep 'tcp.mssdflt'
net.inet.tcp.mssdflt = 512
FW# 

Thanks much for your help on this,

peter