Subject: Re: IPv6 over GRE tunneling?
To: None <tech-net@netbsd.org>
From: Gert Doering <gert@greenie.muc.de>
List: tech-net
Date: 01/25/2005 23:47:11
Hi,

On Sun, Jan 23, 2005 at 09:54:08PM +0100, Gert Doering wrote:
> and I think that modifications to if_gre.c & ip_gre.c to accept 
> IPv6-over-GRE should not be too hard.

Actually it was fairly trivial - except for one really nasty bug in 
net/if_gre.c that hits everyone trying to send a non-IPv4 packet
over GRE (ip->ip_tos is copied to the new IP header, and "ip" is NULL)
- a bug that will bite XEROX and NS users as well, if there are any.
[-> the parts of my diff related to ip_tos in net/if_gre.c should be 
integrated in any case!]

Diffs are appended below (vs. -current), and I'd appreciate if you 
could review them.

There is one problem remaining: for IPv6-over-GRE packets, there is a
weird delay upon reception.  It's like schednetisr() isn't called,
except that the delay always seems to be about 300-1200 ms, while without
schednetisr(), it's indefinite (that is: until another interface receives
an IPv6 packet, to be precise).

Look at this tcpdump on the LAN interface:

23:19:26.267313 195.30.70.42 > 193.149.48.168: gre 2001:608:4:4444::1 > 2001:608:4:4444::2: icmp6: echo request (len 60, hlim 64) (ttl 255, id 825, len 124)
23:19:27.376332 193.149.48.168 > 195.30.70.42: gre 2001:608:4:4444::2 > 2001:608:4:4444::1: icmp6: echo reply (len 60, hlim 64) (ttl 30, id 520, len 124)
23:19:27.386273 195.30.70.42 > 193.149.48.168: gre 2001:608:4:4444::1 > 2001:608:4:4444::2: icmp6: echo request (len 60, hlim 64) (ttl 255, id 826, len 124)
23:19:27.678095 193.149.48.168 > 195.30.70.42: gre 2001:608:4:4444::2 > 2001:608:4:4444::1: icmp6: echo reply (len 60, hlim 64) (ttl 30, id 521, len 124)

195.30.70.42 is a Cisco router, sending IPv6 pings to 193.149.48.168, 
which is the NetBSD machine in question.

The echo request comes in at 23:19:26.267, while the echo reply doesn't
leave before 23:19:27.376 - over a second later.

For the second echo request/reply, the delay is only 300 ms, but still
way too high.

Testing with asymetric tunneling and with IPv4-over-GRE confirms that it's
definitely the receiving path for IPv6-over-GRE (if the tunnel is only
sending, no "weird delays" occur).


The problem *is* related to "schednetisr()" processing in some way.  

If I run a "ping6 -i 0.1 $someotherhost" on the LAN, to make sure the IPv6
input queue is permanently serviced, the tcpdump looks different:

23:24:21.600740 195.30.70.42 > 193.149.48.168: gre 2001:608:4:4444::1 > 2001:608:4:4444::2: icmp6: echo request (len 60, hlim 64) (ttl 255, id 846, len 124)
23:24:21.690332 193.149.48.168 > 195.30.70.42: gre 2001:608:4:4444::2 > 2001:608:4:4444::1: icmp6: echo reply (len 60, hlim 64) (ttl 30, id 707, len 124)
23:24:21.700073 195.30.70.42 > 193.149.48.168: gre 2001:608:4:4444::1 > 2001:608:4:4444::2: icmp6: echo request (len 60, hlim 64) (ttl 255, id 847, len 124)
23:24:21.773719 193.149.48.168 > 195.30.70.42: gre 2001:608:4:4444::2 > 2001:608:4:4444::1: icmp6: echo reply (len 60, hlim 64) (ttl 30, id 709, len 124)

- same two machines, identical tunnel configuration, but delay is down
to 10-90 ms.  Running "ping -i 0.02 $otherhost" in parallel reduces the
delay further (as is to be expected).

Now I need your help :-) - which part of the code shall I poke, to find
out where these reception delays happen?

System environment: NetBSD/Sparc64, Sun Ultra 5, -current as of yesterday
(2005/01/24).

gert

----------- snip ----------
Index: net/if_gre.c
===================================================================
RCS file: /cvsroot/src/sys/net/if_gre.c,v
retrieving revision 1.54
diff -u -r1.54 if_gre.c
--- net/if_gre.c	6 Dec 2004 02:59:23 -0000	1.54
+++ net/if_gre.c	25 Jan 2005 22:28:26 -0000
@@ -7,6 +7,8 @@
  * This code is derived from software contributed to The NetBSD Foundation
  * by Heiko W.Rupp <hwr@pilhuhn.de>
  *
+ * IPv6-over-GRE contributed by Gert Doering <gert@greenie.muc.de>
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
@@ -180,6 +182,7 @@
 	struct gre_softc *sc = ifp->if_softc;
 	struct greip *gh;
 	struct ip *ip;
+	u_int8_t ip_tos = 0;
 	u_int16_t etype = 0;
 	struct mobile_h mob_h;
 
@@ -263,9 +266,14 @@
 			goto end;
 		}
 	} else if (sc->g_proto == IPPROTO_GRE) {
+#ifdef GRE_DEBUG
+		printf( "start gre_output/GRE, dst->sa_family=%d\n", 
+		         dst->sa_family );
+#endif
 		switch (dst->sa_family) {
 		case AF_INET:
 			ip = mtod(m, struct ip *);
+		        ip_tos = ip->ip_tos;
 			etype = ETHERTYPE_IP;
 			break;
 #ifdef NETATALK
@@ -278,6 +286,11 @@
 			etype = ETHERTYPE_NS;
 			break;
 #endif
+#ifdef INET6
+		case AF_INET6:
+			etype = ETHERTYPE_IPV6;
+			break;
+#endif
 		default:
 			IF_DROP(&ifp->if_snd);
 			m_freem(m);
@@ -312,7 +325,7 @@
 		gh->gi_dst = sc->g_dst;
 		((struct ip*)gh)->ip_hl = (sizeof(struct ip)) >> 2;
 		((struct ip*)gh)->ip_ttl = ip_gre_ttl;
-		((struct ip*)gh)->ip_tos = ip->ip_tos;
+		((struct ip*)gh)->ip_tos = ip_tos;
 		gh->gi_len = htons(m->m_pkthdr.len);
 	}
 
@@ -381,6 +394,10 @@
 		case AF_INET:
 			break;
 #endif
+#ifdef INET6
+		case AF_INET6:
+			break;
+#endif
 		default:
 			error = EAFNOSUPPORT;
 			break;
Index: netinet/ip_gre.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/ip_gre.c,v
retrieving revision 1.30
diff -u -r1.30 ip_gre.c
--- netinet/ip_gre.c	26 Apr 2004 01:31:56 -0000	1.30
+++ netinet/ip_gre.c	25 Jan 2005 22:28:27 -0000
@@ -7,6 +7,8 @@
  * This code is derived from software contributed to The NetBSD Foundation
  * by Heiko W.Rupp <hwr@pilhuhn.de>
  *
+ * IPv6-over-GRE contributed by Gert Doering <gert@greenie.muc.de>
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
@@ -145,7 +147,7 @@
 gre_input2(struct mbuf *m, int hlen, u_char proto)
 {
 	struct greip *gip;
-	int s;
+	int s, isr;
 	struct ifqueue *ifq;
 	struct gre_softc *sc;
 	u_int16_t flags;
@@ -186,22 +188,31 @@
 		switch (ntohs(gip->gi_ptype)) { /* ethertypes */
 		case ETHERTYPE_IP: /* shouldn't need a schednetisr(), as */
 			ifq = &ipintrq;          /* we are in ip_input */
+			isr = NETISR_IP;
 			break;
 #ifdef NS
 		case ETHERTYPE_NS:
 			ifq = &nsintrq;
-			schednetisr(NETISR_NS);
+			isr = NETISR_NS;
 			break;
 #endif
 #ifdef NETATALK
 		case ETHERTYPE_ATALK:
 			ifq = &atintrq1;
-			schednetisr(NETISR_ATALK);
+			isr = NETISR_ATALK;
 			break;
 #endif
+#ifdef INET6
 		case ETHERTYPE_IPV6:
-			/* FALLTHROUGH */
+#ifdef GRE_DEBUG
+			printf( "ip_gre.c/gre_input2: IPv6 packet\n" );
+#endif
+			ifq = &ip6intrq;
+			isr = NETISR_IPV6;
+			break;
+#endif
 		default:	   /* others not yet supported */
+			printf( "ip_gre.c/gre_input2: unhandled ethertype 0x%04x\n", (int) ntohs(gip->gi_ptype) );
 			return (0);
 		}
 		break;
@@ -239,6 +250,8 @@
 	} else {
 		IF_ENQUEUE(ifq, m);
 	}
+	/* we need schednetisr since the address family may change */
+	schednetisr(isr);
 	splx(s);
 
 	return (1);	/* packet is done, no further processing needed */
-- 
USENET is *not* the non-clickable part of WWW!
                                                           //www.muc.de/~gert/
Gert Doering - Munich, Germany                             gert@greenie.muc.de
fax: +49-89-35655025                        gert@net.informatik.tu-muenchen.de