Subject: Re: NFS/RPC and server clusters
To: None <tech-net@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-net
Date: 10/15/2003 14:54:19
>> Have you tried to use NFS over TCP instead of UDP?  This way the
>> client would stablish the connection to the service IP.
> I had thought of it, and now I've tried and it works indeed.  (The
> kernel has do do a reconnect on the TCP socket after the failover
> which is a little bit more noisy.)

That's weird; there's no reason it should have to reconnect, since the
peer's IP address is remaining constant.

Now, if the MAC address behind that IP is changing, the arp tables may
need updating, and the reconnect may provoke that, or may be driven off
a similar timeout.  But that's a different issue.

> But it should work with UDP as well, because there is no point in
> paying the TCP overhead in a local network and because it works with
> every other NFS client around here.

My mount_nfs manpage lists an option

     -C      For UDP mount points, do a connect(2).  Although this flag in-
             creases the efficiency of UDP mounts it cannot be used for
             servers that do not reply to requests from the standard NFS port
             number 2049, or for servers with multiple network interfaces. In
             these cases if the socket is connected and the server replies
             from a different port number or a different network interface the
             client will get ICMP port unreachable and the mount will
             hang.

It sounds as though you may want to use this.  However, based on the
original message

> What happens now is that NetBSD's "mount_nfs" does a "portmap" call
> to "service1", gets the reply from "node1", puts the "node1" address
> into the mount(2) argument structure and passes it to the kernel.

This sounds like a fairly severe bug, and it needs fixing.  I added an
option to my mount_nfs

     -Q      Explicitly specify the port number to be used for NFS traffic,
             rather than querying the portmapper on the target host.

which you may want to add to yours.  Or, you could hack mount_nfs.c so
that the portmap lookup does not get to bash the address used by the
mount call.  Exactly how you do this depends on what version you're
running, but doesn't look hard from a short glance through the code.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B