Subject: kern/3508: [dM] ipforward_rt cache broken
To: None <gnats-bugs@gnats.netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: netbsd-bugs
Date: 04/17/1997 15:52:57
>Number:         3508
>Category:       kern
>Synopsis:       [dM] ipforward_rt cache broken
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Apr 17 13:05:02 1997
>Last-Modified:
>Originator:     der Mouse
>Organization:
	Dis-
>Release:        1.2_BETA (code inspection implies also in -current)
>Environment:
	Any (noticed on a SPARC IPC)
>Description:
	ipforward_rt, which appears to be a size-one cache for IP
	packet forwarding, can produce broken routing.

	In this particular case, a NetBSD/sparc machine is on a local
	Ethernet; let's say its address there is 123.45.6.7, and its
	default route is to 123.45.6.1.  A PPP user exists, whose
	remote address is always (say) 123.45.9.1.  For reasons not
	relevant here, the machine is always advertising routes to
	123.45.9.* even when the PPP link is down - when it's down,
	packets for 123.45.9.1 just bounce back and forth between
	123.45.6.7 and 123.45.6.1 until their TTL expires.

	When the PPP user dials in and causes ppp0 to come up, a route
	is (correctly) installed, pointing 123.45.9.1 down ppp0.  The
	problem is, if ipforward_rt happens to hold 123.45.9.1's route
	out the local ethernet to 123.45.6.1, packets for 123.45.9.1
	will still take that route even though that is not the current
	route.  Having the machine attempt to forward a packet to any
	other address promptly cures the problem.
>How-To-Repeat:
	See above.  It's not hard to provoke this deliberately - find a
	machine with no other forwarding traffic, ping a host through
	(not from) it, change the routing table in a way that affects
	that host's route, ping/traceroute again, and notice that the
	old route is still used.  Cause the machine to forward a packet
	for any other address and retry, and notice it's magically
	fixed itself.
>Fix:
	Not sure what the right fix is.  Since on a non-busy machine
	the load from routing lookups is low, and on a busy machine it
	seems reasonably likely that a cache as small as one will be
	missed more often than not, so I'd be tempted to remove the if
	entirely and always do the lookup.  Alternatively I'd add a
	heartbeat timer that clears that cache reasonably often wrt the
	sort of timescale on which routes appear and disappear, but
	seldom wrt inter-packet arrival times - from 1 to 0.1 Hz seems
	reasonable to me.

	The _right_ fix would probably be to explicitly clear that
	cache every time the routing tables get changed, or perhaps
	keep a routing-table generation count and have ip_forward
	ignore the cache if the routing table generation has changed.

	For this machine I'll prolly just toss the cache entirely and
	always do the lookup.  Does anyone have stats on the hit rate
	of that cache?  I didn't see any instrumentation in the code.

	This code has changed some between 1.2 and -current, but
	reading the code makes me think the bug is probably present in
	-current as well.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>Audit-Trail:
>Unformatted: