Subject: incorrect connect() behavior (fwd)
To: None <tech-kern@netbsd.org>
From: Maxim Konovalov <maxim@macomnet.ru>
List: tech-kern
Date: 06/10/2004 21:30:32
Gentlemen,

To make long story short: in_pcbconnect() does not return ENETUNREACH
if it fails to find a route to the destination.  Instead it finds a
first non-loopback interface and binds the local end of the connection
to its IP address.  It is more or less OK for TCP because we get
EHOSTUNREACH in ip_output soon but with UDP it is not the case.

Both Solaris 8 and Linux 2.4.x return ENETUNREACH on
connect(SOCK_DGRAM) as soon as they discover the absense of the route.

What do people think about an enclosed patch?  Have we missed something?

Index: src/sys/netinet/in_pcb.c
===================================================================
RCS file: /home/netbsd/src/sys/netinet/in_pcb.c,v
retrieving revision 1.95
diff -u -r1.95 in_pcb.c
--- src/sys/netinet/in_pcb.c	25 Apr 2004 16:42:42 -0000	1.95
+++ src/sys/netinet/in_pcb.c	10 Jun 2004 17:28:26 -0000
@@ -984,11 +984,8 @@
 		ia = ifatoia(ifa_ifwithladdr(sintosa(sin)));
 		sin->sin_port = fport;
 		if (ia == 0) {
-			/* Find 1st non-loopback AF_INET address */
-			TAILQ_FOREACH(ia, &in_ifaddrhead, ia_list) {
-				if (!(ia->ia_ifp->if_flags & IFF_LOOPBACK))
-					break;
-			}
+			*errorp = ENETUNREACH;
+			return NULL;
 		}
 		if (ia == NULL) {
 			*errorp = EADDRNOTAVAIL;
%%%

-- 
Maxim Konovalov

---------- Forwarded message ----------
Date: Mon, 7 Jun 2004 23:02:40 +0400 (MSD)
From: Maxim Konovalov <maxim@macomnet.ru>
To: Gleb Smirnoff <glebius@cell.sick.ru>
Cc: freebsd-current@freebsd.org
Subject: Re: incorrect connect() behavior

[ Change CC: in hope to reach a wide audience. ]

On Sun, 30 May 2004, 14:07+0400, Gleb Smirnoff wrote:

>   Dear networkers,
>
>   there is a problem in connect() syscall, which can be reproduced
> on a box running without default route.
>
> According to POSIX, connect() must return if ENETUNREACH, if a route to
> destination was not found.
>
> http://www.opengroup.org/onlinepubs/000095399/functions/connect.html
>
>   In case of SOCK_STREAM it works this way. But in case of

Yep, the absence of the route is discovered in ip_output() and
EHOSTUNREACH returned.

> SOCK_DGRAM connect() does not return error. And it picks up first
> available local IP address for local side of socket. In some cases
> this address may appear to be 127.0.0.1. Later, when a route to
> destination shows up, datagrams will fail to send, since 127.0.0.1
> can not appear on wire.
>
> Affected installations are:
>  - BGP routers without default route
>  - localnet routers running some IGP
>
> Affected applications are:
>
> - ntpd. ntpd starts before routing daemon have established all
>   adjacencies, connect() binds to 127.0.0.1. Later when routing show
>   up, ntpd fails to send dgrams to server.
> - net-snmpd. It is difficult to reproduce, but after some route
>   flapping snmpd hangs, and does not respond to requests. This can
>   be workarounded with a static route to source of queries.
> - ng_ksocket. If node is of type inet/dgram/udp and a connect
>   message is sent to it, it does not return an error. Later it fails
>   to send packets withEPERM.
>
> Here is attached a test case for this problem no-route-test.c. To
> test, one needs to delete default route, compile no-route-test and
> run it. If connect() picks up non-localhost address, then you are
> lucky :), some of your interfaces was ifconfiged before lo0. To
> reproduce problem with 100 % guarantee, one needs to have lo0 first
> one in list ${network_interfaces} var in /etc/rc.conf. Then you
> should add default route, and look into what is typed by
> no-route-test, which was started before this route was added.

Someone can use udpcliserv/udpcli09.c from UNPv1 as well.

> I have written two patches to deal with this problem. The first one
> clings to POSIX behavior - it returns ENETUNREACH. I have tested
> ntpd with it - it works well. But there is no guarantee that
> anything else would be broken. The second patch is a POLA-patch, it
> makes connect() to take first non-localhost address for local side
> of socket. Code was obtained directly from NetBSD. This patch is
> considered not to break anything. Both patches are attached.

I prefer the former: return ENETUNREACH as soon as we detect we do not
have a route to the net.  Btw, Solaris 8 and Linux 2.4.x work this way.
Discuss the issue with NetBSD people is a good idea too.

%%%
Index: in_pcb.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.147
diff -u -r1.147 in_pcb.c
--- in_pcb.c	20 May 2004 06:35:02 -0000	1.147
+++ in_pcb.c	29 May 2004 21:12:40 -0000
@@ -612,9 +612,7 @@
 			if (ia == 0)
 				ia = ifatoia(ifa_ifwithnet(sintosa(&sa)));
 			if (ia == 0)
-				ia = TAILQ_FIRST(&in_ifaddrhead);
-			if (ia == 0)
-				return (EADDRNOTAVAIL);
+				return (ENETUNREACH);
 		}
 		/*
 		 * If the destination address is multicast and an outgoing
%%%

Any comments?

-- 
Maxim Konovalov