Subject: amd problem, need Sun RPC expert
To: None <tech-net@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-net
Date: 04/06/1999 14:23:27
Hi,
I've found a problem with amd and a SunOS 4.1.4 server, which seems related to
the way it does RPC calls.
At first use of a mount point, amd will first try to detect the best
nfs version and transport to use, unless explicitely stated in the config
file (Some time ago, I changed it to prefer UDP over TCP).
This is done in amd/amd/srvr_nfs.c, around lines 734 (function
find_nfs_srvr()). It will call get_nfs_version() for first udp, then tcp
transport. get_nfs_version() returns the higtest version available on the
server (0 if no version available for this transport). find_nfs_srvr() will
retain the transport which provides the higthest version, prefering UDP.

get_nfs_version() (in amd/libamu/tranputil.c) will call clnttcp_create()
or clntudp_create() for nfs version 3. If it fails, return 0. Otherwise
it does a clnt_call(NFSPROC_NULL) call. If it fails, retry with nfs version
2 instead of 3. get_nfs_version() uses only local variables, and seems to
properly destroy the handle before exiting.

Now the problem: with a SunOS 4.1.4 servers, the calls are as follow:
get_nfs_version("udp") ->
	clntudp_create() for nfs version 3, succees
	clnt_call() fail
	clntudp_create() for nfs version 2, succees
	clnt_call success
	return 2
get_nfs_version("tcp")->
	clnttcp_create() for nfs version 3 fails
	return 0
So find_nfs_srvr() will use NFS version 2, proto UDP. But the mount fails,
for obscure reasons (tcpdump shows the server anserw the request, but I don't
know if it is correct or not).
The cause seems to be that the last call was a call to clnttcp_create() which
failed. If I reverse the order of transport probe, then clnttcp_create()
is called first, and clntudp_create for nfs version 2 last, and the mount
will succeed. If I leave the probe in this order, but add a call to
get_nfs_version() with the selected version and proto (which will do
a single call to clntudp_create() for nfs version 2 and a clnt_call() )
before the mount, the mount also succeed.

I think this is a problem the way RPCs are used, or a bug in our RPC, or
a bug in SunOS 4.1.4 RPC/nfs. Could someone have a look at get_nfs_version()
and confirm it does the rigth thing, and especially that it frees all the
allocated resources properly ? I've looked at it, and didn't see anything
wrong, but I may very well have missed something.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--