Subject: IPSec tunnel broke with ~latest 1.6Z
To: None <current-users@netbsd.org>
From: Arto Selonen <arto@selonen.org>
List: current-users
Date: 09/09/2003 18:39:54
Hi!

I don't know whether I just picked a bad time to try and upgrade my 1.6T
to 1.6Y (saturday), and then to 1.6Z (Sunday, Monday, Tuesday), but after
getting the sources from anoncvs today (around noon, local time EEST),
my IPSec tunnel to another 1.6T broke at least partially. Looking at
current-users and source-changes I didn't notice anything that would
indicate an expected breakdown (not that I'd _really_ know, anyway). :)

Before I dig deeper into this, I'd like to know whether this is
Yet Another Bad Anoncvs Snapshot, a user error (hey, I didn't touch
anything that was working, other than the OS upgrade), or something that
needs to be fixed in source (but is not already being worked on).

Here's my problem:

                  <IPSec tunnel>
SOMENET <--> 1.6Z <--INTERNET--> 1.6T <---> OTHERNET

The above NetBSD systems have a transport mode IPSec tunnel between
them with a manual key. The setup has worked for well over a year now.
Today, I cvs updated my sources from anoncvs(fi), and once the new kernel
was running, the tunnel broke (partially; I'll explain in a sec).
Installing matching userland didn't change anything; kernel config was
from 1.6T + COMPAT_16.

Basically, connections from {OTHER or 1.6T} to 1.6Z work OK (and nothing
never really needed to connect to SOMENET, so I wouldn't know if things
changed in that respect), but from {SOMENET or 1.6Z} I can't get to
{1.6T or OTHERNET} (as I used to before the upgrade).

A rough sketch for an ICMP echo from SOMENET (to demonstrate the problem):
(this used to work before the 1.6T -> 1.6Z upgrade)

	1) ICMP echo sent from SOMENET to OTHERNET
	2) packet received at 1.6Z if@SOME
	3) packet observed going out on gif (the tunnel) at 1.6Z
	4) packet observed coming in on gif (the tunnel) at 1.6T
	5) packet forwarded to OTHERNET
	6) packet received at destination, ICMP echoreply sent
	7) reply packet observed going to 1.6Z on gif at 1.6T
	8) IPSEC traffic observed coming from 1.6T on if@INTERNET at 1.6Z
	9) nothing shows up on the tunnel gif at 1.6Z

Since I can connect to 1.6Z through the tunnel, both directions obviously
can code and decode the packets properly. It seems as if it was related
to the IP addresses of the 1.6Z. When OTHER targets 1.6Z by its public
IP, then things work; when 1.6Z (or SOMENET) send stuff, it is sent from
private address space (handled nicely at 1.6T and OTHER). When the return
traffic arrives (tcpdump shows ESP/AH stuff coming in), it just
"disappears" (tcpdump on the gif interface shows nothing; no app sees
them, etc). Maybe a new interaction? FAST_IPSEC?

Will things return to "normal" (the way they were) by upgrading 1.6T to
1.6Z, upgrading 1.6Z again in a few days, or do I need to re-configure
the whole setup to restore the previous connectivity? What did I miss?

[If somebody is interested, I can provide more details/config files, etc]


Arto Selonen
#######======------  http://www.selonen.org/arto/  --------========########
Everstinkuja 5 B 35                               Don't mind doing it.
FIN-02600 Espoo        arto@selonen.org         Don't mind not doing it.
Finland              tel +358 50 560 4826     Don't know anything about it.