Subject: bin/10686: rpcbind doesn't always DTRT with non-local networks
To: None <gnats-bugs@gnats.netbsd.org>
From: Manuel Bouyer <bouyer@hera.lip6.fr>
List: netbsd-bugs
Date: 07/26/2000 05:46:14
>Number:         10686
>Category:       bin
>Synopsis:       rpcbind doesn't always DTRT with non-local networks
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 26 05:47:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Manuel Bouyer
>Release:        NetBSD 1.5_ALPHA as of 2 days ago
>Organization:

LIP6/RP, Universite Paris VI.
     
>Environment:
	
System: NetBSD hera 1.5_ALPHA NetBSD 1.5_ALPHA (HERA) #2: Tue Jul 25 13:24:33 MEST 2000 root@civry:/usr/sup_src/src/sys/arch/i386/compile/HERA i386

>Description:
	This machine is acting as a master NIS server, in remplacement of a
	SunOS 4.1.4 box. The box has only one IP addr, and is NIS server
	for clients on both the local net (which are using broadcast) and
	remotes, on different IP segments (linux, various NetBSD releases and
	Solaris).
	With the stock rpcbind, rpcbind dumps core soon after ypserv is started.
	This seems to be as soon as some clients tries to access ypserv
	(but it's not with any client: tests with a test domain and a reduced
	set of clients didn't show this). rpcbind was dumping core because
	cap->rmt_uaddr passed to sscanf() in xdr_rmtcall_result() was NULL.
	This seems already reported in bin/10487, I tried the patch proposed
	in this PR:
RCS file: /pub/NetBSD-CVS/basesrc/usr.sbin/rpcbind/rpcb_svc_com.c,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 rpcb_svc_com.c
--- rpcb_svc_com.c      2000/06/23 08:16:03     1.1.2.1
+++ rpcb_svc_com.c      2000/07/26 10:23:40
@@ -448,7 +448,8 @@
                u_long port;
	 
		 /* interpret the universal address for TCP/IP */
-               if (sscanf(cap->rmt_uaddr, "%d.%d.%d.%d.%d.%d",
+               if ((cap->rmt_uaddr == 0) ||
+                   sscanf(cap->rmt_uaddr, "%d.%d.%d.%d.%d.%d",
			 &h1, &h2, &h3, &h4, &p1, &p2) != 6)
			return (FALSE);
		 port = ((p1 & 0xff) << 8) + (p2 & 0xff);

	With this patch, rpcbind no longer dumps core but some remote
	client can't access ypserver (clients from the local network didn't
	have any troubles). I debugged this with a NetBSD 1.3.2 client, I
	didn't closely at what other did. But obvisouly some of them worked.
	The problem was very strange, being that rpcinfo on the client didn't
	have problems talking to the server (both '-p' and '-u 100004'), but
	ypbind didn't. A tcpdump showed that the client sent and UDP packet
	to the server's rpcbind but the server nerver anserwed.
	Running with '-d' showed a lot of "rpcbproc_callit_com:  duplicate
	request".
	After some debugging, I found that the addrmerge() call in
	rpcbproc_callit_com() always returned NULL for the remote client
	(looks like because it didn't find any interface for this one, which is
	OK as the client is not on a local net). It seems that because of this
	calls from different clients were handled as duplicate requests
	from the same client and ignored. The very first request eventually
	got handled with the bin/10487 patch, I didn't check this.
	Based on other use of addrmerge() of mergeaddr() in the code
	I did this change:
diff -u -r1.1.2.1 rpcb_svc_com.c
--- rpcb_svc_com.c      2000/06/23 08:16:03     1.1.2.1
+++ rpcb_svc_com.c      2000/07/26 10:23:40
@@ -753,6 +754,8 @@
            addrmerge(&tbuf, rbl->rpcb_map.r_addr, NULL, nconf->nc_netid);
	m_uaddr = addrmerge(caller, rbl->rpcb_map.r_addr, NULL,
	    nconf->nc_netid);
+       if (m_uaddr == NULL)
+               m_uaddr = strdup(rbl->rpcb_map.r_addr);
#ifdef RPCBIND_DEBUG
	if (debugging)
	fprintf(stderr, "merged uaddr %s\n", m_uaddr);

I'm not sure what this is supposed to do (looks like using the caller's addr
instead of the merged one) but now m_uaddr is never NULL, and rpcbind is
working properly with both local and remote clients (for ypserv, NFS, rstatd
rquotad). However I didn't try to understand the depths of rpcbind, and I don't
know what m_uaddr is really used for. So this change may not be the rigth one.
I leave this to someone really understanding the code :)

>How-To-Repeat:
	setup a yp server, set up several clients on a different subnet.
	It may be necessary to reboot the server once all clients are
	running to trigger the bug.
>Fix:
	See above. The proposed patch may only be a workaround for my problem,
	and not a real fix.
	Fixing this may also fix bin/10487 the rigth way.
>Release-Note:
>Audit-Trail:
>Unformatted: