Subject: gre tunneling causing predictable reboot
To: None <tech-net@netbsd.org>
From: Rick Byers <RickB@BigScaryChildren.net>
List: tech-net
Date: 04/18/2001 21:26:00
Hi,

[ This is a long winded message: executive summary is that doing greconfig
before ifconfig instead of the other way around appears to cause my
machine to predictably and consistently reboot without any error message
after running fine for some fixed and consistent period of time (around 5
minutes) ]

I've been using gre for a while for IP over IP tunneling.  I have a
few networks that use an internal private IP range with NAT to
connect to the internet.  To facilitate direct communication between
the internal machines (i.e. using the private IP ranges), each pair
of networks has a GRE tunnel between it.  Each gateway is running
NetBSD/i386 (some on 1.4, some on 1.5).

After much experimentation (a couple of years ago), I found the best
results were achieved by setting up the tunnels with a command sequence
like the following:
ifconfig greN inet 192.168.1.1 192.168.2.1 netmask 0xffffffff link0 up
greconfig -i greN -s 24.1.2.3 -d 27.4.5.6
route add -net 192.168.2 192.168.2.1
where 192.168.1.1 and 24.1.2.3 are the IP addresses (internal and
external) of the local NetBSD box and 192.168.2.1 and 27.4.5.6 are the IP
addresses of the remote NetBSD box.  

I'm not positive that setting the tunnel endpoints to different addresses
than the interface is the right thing to do, but its the only thing that
apears to give the exact behaviour I want.  Specifically, I want it such
that: 
- any computer on the 192.168.1 network (including the server) can
talk to any computer on the 192.168.2 networm (including the server)
through the tunnel 
- any traffic adddressed to the external address of the remote server does
NOT go over the tunnel (so that if the remote server doesn't have its side
of the tunnel up, it is still contactable) 
- if the local server pings the remote server's internal IP (192.168.2.1
in this example), the interior IP header should have a source of
192.168.1.1.  I.E. when requests are made over the tunnel, responses are
recieved over the tunnel.  This is why I use the internal addresses when 
configuring the interface instead of using the same addresses as the
tunnel endpoints - and hence the reason I'm using gre instead of ipip.

First of all, does this seem reasonable?  Is there something I'm missing
as to how this should be setup properly?

Now onto the real issue:
Today I re-wrote all my hard-coded scripts into a nice generic rc.d
style script and config file.  After making some other changes, I started
having problems with my server.  It would stay up for about 5 minutes and
then just reboot (no panic message or anything).  I went out for a few
hours and when I came back and checked the logs, it has been rebooting
continuously.  The strange thing was that the time between boot-ups
(i.e. the time the first syslog messages were created per cycle) was
almost exactly 7 minutes every time - for 25 reboots!  This means that the
server was crashing after almost the exact same uptime (within a few
seconds or so) every time.

After selectivly undoing each change I made during the day, one at a time,
I eventually narrowed it down to a minor change I made in the gre
scripts.  For some reason, I had put the greconfig line before the
ifconfig line.  Restoring the order of theese two lines cleared up the
problem.  This seems very strange to me, and I have no idea how to debug a
problem like this.

This is all happening on my NetBSD-1.5.1_BETA/i386 box.  I'm using
ipf/ipnat and rp-pppoe for connectivity & firewall.

Its a mystery to me.  But just incase anyone else might experience the
same frustrating problem, I thought I'd report it.  Anyone care to take a
crack at the cause of this?  Since this problem is completely repeatable
on my machine, I'm willing to do experiments (after I get back from
Floridia on the 26th) to help narrow down the cause.

Thanks,
	Rick