Subject: bin/20768: amd has sometimes problem remounting
To: None <gnats-bugs@gnats.netbsd.org>
From: Manuel Bouyer <bouyer@asim.lip6.fr>
List: netbsd-bugs
Date: 03/17/2003 12:56:58
>Number:         20768
>Category:       bin
>Synopsis:       amd has sometimes problem remounting
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 17 03:58:01 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     
>Release:        NetBSD 1.6.1_RC2
>Organization:

LIP6, Universite Paris VI.

>Environment:
System: NetBSD armandeche 1.6.1_RC2 NetBSD 1.6.1_RC2 (ARMANDECHE) #5: Fri Mar 14 15:21:13 CET 2003 bouyer@folk:/local/folk1/bouyer/netbsd-1-6/src/sys/arch/alpha/compile/ARMANDECHE alpha
Architecture: alpha
Machine: alpha
Problem also seen on i386

>Description:

I use amd with the 'net' example map:
#cat /etc/amd.conf
[ global ]
dismount_interval =     900

[ /net ]
map_name =              /etc/amd/net
#cat /etc/amd/net
# $NetBSD: net,v 1.2 1997/12/12 11:52:55 hubertf Exp $
#
# /net - NFS-mount directory by cd'ing into it: cd /net/host/filesystem;
#        be sure to mkdir /net before using this.
#
/defaults       type:=host;rhost:=${key};fs:=${autodir}/${rhost}/root
*               host==${key};type:=link;fs:=/                           \
                host!=${key};opts:=rw,hard,intr,nodev,nosuid,noconn

This worked fine with NetBSD 1.6. With 1.6.1_RC2 I start seeing the following
problem: I cause amd to mount a remote dir. I leave it busy for some time
and then unbusy it (just doing cd /net/some/server; sleep <a few hours>; cd /
seems enouth to reproduce the problem). Then a few minutes later try to
mount it again. The access from shell fail with a EIO, and /var/log/message
prints:
Mar 17 12:34:58 armandeche amd[13622]: mountd rpc failed: RPC: Unable to receive
Mar 17 12:34:58 armandeche amd[13622]: mountd rpc failed: RPC: Unable to receive
Mar 17 12:34:58 armandeche amd[144]: Process 13622 exited with signal 13
Mar 17 12:34:58 armandeche amd[144]: mount for /net/jazz got signal 13

Waiting a bit more usually gets the mount OK.

>How-To-Repeat:
	with the above amd maps. It's somewhat random, but it's usually
	related to some sequence like:
	cd /net/some/server
	sleep <a few hours>
	cd /
	sleep 650
	cd /net/some/server
	
>Fix:
	unknow. What does the "Unable to receive" error mean ?
>Release-Note:
>Audit-Trail:
>Unformatted: