Subject: Re: Default value of net.inet.ipsec.dfbit breaks PMTU over IPsec tunnels
To: Thor Lancelot Simon <>
From: Daniel Carosone <>
List: tech-net
Date: 05/29/2004 07:34:09
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

There are two issues here that have become somewhat conflated. They
are linked, but not the same.  Consider a generic network like so:


A--D is cleartext IP, B=3D=3DC is an ipsec tunnel. There are intermediate
hops between each of these, not shown.

The first issue is whether PMTUD exposes the narrower tunnel as part
of the A--D path. This is an issue for processing of the CLEARTEXT
packets by B and C, on input, just like any other router about to send
a packet down a narrower interface.  When processing a DF packet for
ESP encapsulation, the packet should be dropped and an ICMP generated,
if fragmentation would be required. Some hosts don't even get this
much right.

The second issue is whether PMTUD can discover even smaller path
segments between B=3D=3DC.  This requires the DF bit set (or copied) to
the outer header, so that B or C can get ICMP notifications for their
ESP packets, which is the sysctl we've been talking about.  It ALSO
requires that B and C do something useful with those packets to then
expose the yet narrower tunnel to A and D.

It's this last bit I'm not convinced we do right, though of course the
problem may also be earlier. There's at least the reasonable concern
that packets outside the cryptosystem can now influence the protected

There is also the wrinkle that, when A&B or C&D are the same host,
there's no separate interface processing and ICMP return packets, so
the code has to handle this case specially, making even more of a
mess. Perhaps we handle the above right, and screw up in this

Either way, tunnel-mode IPsec TCP performance bites, and there's
clearly a fragmentation problem no matter which way you set the
various options. This is true even when B=3D=3DC has no narrower-MTU
issues of its own (eg, peers on a wireless segment) and C&D are the
same host.

As an aside, I've often thought it was a mistake not to implement
ipsec as a logical interface, but that's an unfortunate consequence of
the way it was defined.  As I've said before, I simulate this by using
separate gif interfaces for the tunnels, and doing transport-mode
ipsec of those. I don't know how gif(4) handles PMTUD for the tunnel,
but I can manually set the interface MTU down.

Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.4 (NetBSD)