Subject: NFS mounts hanging
To: None <tech-kern@netbsd.org>
From: None <rick@snowhite.cis.uoguelph.ca>
List: tech-kern
Date: 10/03/2000 12:09:18
I'm way out of date on the status of the NetBSD NFS code, but maybe this
will help:

- soft mounts will fail after N retries, whereas a hard mount will just
  keep on trying. Unless the code's busted, a hard mount should eventually
  coninue on after the server has rebooted.
- Since the retransmit timeout does back off quite a bit, it may take a
  while for the client to retransmit and start up again after the server
  is rebooted.
- I agree with the statement w.r.t. UDP vs TCP, since it can take up to
  15minutes (worst case) for a TCP mounted client to recognize a server
  reboot and do a remount, whereas for UDP it should only be a couple of
  minutes.
- With 500 clients, you might also have a "storm" of requests hitting the
  server when it reboots, and this may be causing server congestion (just
  like a network congestion, but for NFS RPCs). If this is the case, things
  won't recover gracefully and rebooting the clients a few at a time, might
  be the only way to get things going again.

All in all, with 500 clients, a reliable NFS server is a must, IMHO.

Good luck with it, rick
ps: As a fellow sysadmin, I'm glad I'm not in your shoes right now:-)