Subject: Re: NFS problems with 1.5.1_BETA2
To: Frank van der Linden <fvdl@wasabisystems.com>
From: Tracy J. Di Marco White <gendalia@iastate.edu>
List: port-i386
Date: 06/17/2001 11:29:08
Frank van der Linden wrote:
}On Mon, Jun 11, 2001 at 01:34:54PM -0500, Tracy Di Marco White wrote:
}> 
}> Two Linux machines that nfs mount a
}> filesystem from this server see 30 second plus timeouts before the
}> mount succeeds.  Another Linux machine is diskless, the mount of the
}> root file system started failing Sunday (before the update to latest
}> source, and the reason for it), and the machine panics when it isn't
}> able to mount the root file system.  The Linux boxes are all showing
}> error -5.
}
}I have no idea what this could be. There weren't any relevant changes
}to mountd or nfsd that could cause this. If you could send me a tcpdump
}output of the relevant exchange, I might be able to track what's
}going on there.

On the linux side, an interesting feature seems to be that enabling
portmap fixes the timeouts.  This doesn't help with the diskless box,
since it can't run portmap til mounting the filesystems.  The odd bit
about the diskless box is it worked, then failed to work, and the only
thing we can track down that changed was the switch it's connected to
had its uplink upgraded to 100Mb.  After consistently failing to boot
all day Sunday, the diskless machine booted fine Friday, still on
100Mb.  The fileserver has been rebooted a couple times, and while
that didn't seem to fix the diskless machine when it was not booting,
it has fixed mounting problems with NetBSD machines.

}> I'm also seeing problems with NetBSD machines (current-1.5U & 1.5.1_BETA2)
}> mounting NFS from the file server.  Mostly I'm just seeing alternating
}> "fileserver not responding" "fileserver is alive" on the NetBSD machines.
}
}I've seen that kind of ping-pong message behavior before. I am not
}completely sure what causes it, but you could try using TCP for the
}clients and see if that stabilizes things.

I'll try that.  I'm usually only seeing it when I'm compiling in a
remotely mounted source tree.  I've started moving all the obj
directories local, which has speeded things up all around.  I have
the fileserver running rpc.lockd & rpc.statd, FYI.

Yesterday on another NetBSD 1.5.1_BETA2 machine was unable to mount the
exported file system.  authlog shows:
Jun 17 01:09:09 lyra rpcbind: connect from 192.168.69.12 to getport/addr(nfs)
Jun 17 01:10:10 lyra rpcbind: connect from 192.168.69.12 to getport/addr(nfs)
1022
Jun 17 01:11:07 lyra rpcbind: connect from 192.168.69.12 to getport/addr(mountd)
Jun 17 01:11:13 lyra rpcbind: connect from 192.168.69.12 to getport/addr(mountd)
Jun 17 01:11:59 lyra rpcbind: connect from 192.168.69.12 to getport/addr(nfs)
Jun 17 01:12:44 lyra rpcbind: connect from 192.168.69.12 to getport/addr(nfs)
Jun 17 01:13:14 lyra rpcbind: connect from 192.168.69.12 to getport/addr(mountd)
The mountd requests were from me running showmount.  The nfs requests were
from me trying to mount the exported fs.  I never got a mountd request from
trying to mount the fs.

After rebooting the fileserver:
Jun 17 01:18:44 lyra rpcbind: connect from 192.168.69.12 to getport/addr(nfs)
Jun 17 01:18:44 lyra rpcbind: connect from 192.168.69.12 to getport/addr(mountd)
and the exported fs mounts just fine.  I don't have any useful
debugging information, however.

Tracy J. Di Marco White
Project Vincent Systems Manager
gendalia@iastate.edu