Subject: Re: kern/3508 bug: cached ip route and interface up/down.
To: None <gnats-bugs@netbsd.org, tech-net@netbsd.org, tech-kern@netbsd.org>
From: Tad Hunt <tad@entrisphere.com>
List: tech-net
Date: 11/12/2002 15:08:15
In message <20021112173218.C18167@wooj.nexthop.com>, Nick said:
;It's been pointed out on current-users that the kernel is "simple-minded"
;about routing.  In your case, if the interface a route was using goes down,
;it's still up to a userland program to reroute the traffic by changing
;the table.
;
;A program such as gated(8) watches for interface changes like this, and
;will update e.g. static routes as necessary.  So in this case, without
;alternate paths available, when ifc1 goes down, the route is deleted.
;When it comes back up, the route is re-added.  At this point ip_output()
;notices that the cached route is dead and fetches the new route.
;
;Nick

Except it doesn't.  The interface changes were simply an easy way
to cause a system to exhibit the problem.

The problem is that this variable, "ipforward_rt", isn't part of
the routing table, so modifying the routing table is not enough to
fix the problem.  This variable is a cached copy of a route from
the routing table in the ip stack.  It doesn't matter what my
routing table says when the ip stack is not even referring to what
is currently existing in the routing table.

Previously, I had asked about separate interface issues, which were
solved by running gated(8) so that it could manage the routing table.
This works great, and solves those problems.  However, gated(8) cannot
do anything about a cached route that is in the ip stack.

You can cause the same problem without touching the interfaces,
simply by modifying the routing table.

Given such a routing table:

	default        GW 192.167.1.1
	192.168.1.0/24 GW 192.168.2.1

1) Assume the traffic is using the second route.

2) Delete the second route.

3) The traffic still attempts to use the second route, even though
   it no longer exists in the routing table.

4) Unless I'm mistaken, there is no user-process way to clear the
   cached route from the ipforward_rt variable.

5) Another packet arriving which needs to be forwarded to a different
   destination will cause the contents of the ipforward_rt variable
   to be discarded and filled with a cached copy of the route for
   the new packet.  When a packet for the old destination then
   arrives, the cached route is replaced again.  That is, as soon
   as a packet arrives for a different destination, the problem
   goes away because the cache (of size 1) will be thrashed, and
   that fixes it.

-Tad