Subject: NFS mounts hanging
To: None <tech-kern@netbsd.org>
From: None <rick@snowhite.cis.uoguelph.ca>
List: tech-kern
Date: 10/03/2000 12:09:18
I'm way out of date on the status of the NetBSD NFS code, but maybe this
will help:
- soft mounts will fail after N retries, whereas a hard mount will just
keep on trying. Unless the code's busted, a hard mount should eventually
coninue on after the server has rebooted.
- Since the retransmit timeout does back off quite a bit, it may take a
while for the client to retransmit and start up again after the server
is rebooted.
- I agree with the statement w.r.t. UDP vs TCP, since it can take up to
15minutes (worst case) for a TCP mounted client to recognize a server
reboot and do a remount, whereas for UDP it should only be a couple of
minutes.
- With 500 clients, you might also have a "storm" of requests hitting the
server when it reboots, and this may be causing server congestion (just
like a network congestion, but for NFS RPCs). If this is the case, things
won't recover gracefully and rebooting the clients a few at a time, might
be the only way to get things going again.
All in all, with 500 clients, a reliable NFS server is a must, IMHO.
Good luck with it, rick
ps: As a fellow sysadmin, I'm glad I'm not in your shoes right now:-)