Subject: bin/4082: amd dumps core if a server is down
To: None <gnats-bugs@gnats.netbsd.org>
From: Matthieu Herrb <matthieu@laas.fr>
List: netbsd-bugs
Date: 09/04/1997 22:29:43
>Number:         4082
>Category:       bin
>Synopsis:       amd dumps core if a server is down
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people (Utility Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep  4 18:05:01 1997
>Last-Modified:
>Originator:     
>Organization:
 Matthieu Herrb   |  e-mail: matthieu@laas.fr
 CNRS/LAAS        |     url: <A HREF="http://www.laas.fr/~matthieu">
 Toulouse, France |  War, what is it good for ? Absolutely nothing !
>Release:        NetBSD-current 08/20/97
>Environment:
	
System: NetBSD abel 1.2G NetBSD 1.2G (ABEL) #1: Wed Jul 2 13:35:44 MEST 1997 matthieu@abel:/usr/src/sys/arch/sparc/compile/ABEL sparc


>Description:
	If a NFS server used by an amd map is down, amd dumps core and
	thus all access to an automounted directory hangs.
>How-To-Repeat:

In the following amd map, pif-1 is down. This map is used on
/home. Try to 'cd /home/bug' and watch amd dumping a core.

/defaults opts:=resvport,nosuid,noconn
matthieu	type:=nfs;rhost:=pif;rfs:=/users1;sublink:=matthieu
bug	type:=nfs;rhost:=pif-1;rfs:=/users1;sublink:=bug

Here are some debugging info from gdb:

Note that the IP address for pif-1 is available, althrough it doesn't
seem initialized here:

abel# host pif-1
pif-1.laas.fr           A       140.93.160.46

Program received signal SIGSEGV (11), Segmentation fault
0x9894 in prime_nfs_fhandle_cache (path=0x27906 "/users1", fs=0x26b00, 
    fhbuf=0xf7ffee80, wchan=0x25d00) at /usr/src/usr.sbin/amd/amd/ops_nfs.c:353
353       if (fp->fh_sin.sin_addr.s_addr != fs->fs_ip->sin_addr.s_addr) {
(gdb) p fp
$1 = (fh_cache *) 0x25e00
(gdb) p *fp
$2 = {fh_q = {q_forw = 0x25c80, q_back = 0x1d724}, fh_wchan = 0x25d00, 
  fh_error = -1, fh_id = 3, fh_cid = 57, fh_nfs_version = 0, fh_nfs_handle = {
    v3 = {fhs_status = MNT3_OK, mountres3_u = {mountinfo = {fhandle = {
            fhandle3_len = 0, fhandle3_val = 0x0}, auth_flavors = {
            auth_flavors_len = 0, auth_flavors_val = 0x0}}}}, v2 = {
      fhs_status = 0, fhstatus_u = {
        fhs_fhandle = '\000' <repeats 31 times>}}}, fh_sin = {
    sin_len = 0 '\000', sin_family = 0 '\000', sin_port = 0, sin_addr = {
      s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, fh_fs = 0x0, 
  fh_path = 0x0}
(gdb) p fs
$3 = (fserver *) 0x26b00
(gdb) p *fs
$4 = {fs_q = {q_forw = 0x26980, q_back = 0x1d140}, fs_refc = 1, 
  fs_host = 0x27960 "pif-1.laas.fr", fs_ip = 0x0, fs_cid = 56, fs_pinger = 30, 
  fs_flags = 21, fs_type = 0x5ca0 "nfs", fs_version = 0, 
  fs_proto = 0x52d0 "udp", fs_private = 0x27980, 
  fs_prfree = 0x1c118 <_DYNAMIC+280>}
(gdb) p fs->fs_ip
$5 = (struct sockaddr_in *) 0x0
(gdb) quit


>Fix:
This patch seem to fix the problem and restore the correct behaviour:

--- amd/srvr_nfs.c.orig	Fri Jul 25 13:26:52 1997
+++ amd/srvr_nfs.c	Thu Sep  4 22:26:09 1997
@@ -721,11 +721,6 @@
       }
       nfs_version = best_nfs_version;
     }
-
-    if (!nfs_version) {
-      free((voidp)ip);
-      ip = 0;			/* Server probably down - no ping responce */
-    }
 #else /* not HAVE_FS_NFS3 */
     nfs_version = NFS_VERSION;
 #endif /* not HAVE_FS_NFS3 */
>Audit-Trail:
>Unformatted: