Subject: Re: OT: NFS timeout question
To: None <firstname.lastname@example.org>
From: Christos Zoulas <email@example.com>
Date: 12/07/2003 06:05:45
In article <20031206214444.A686@cs24279-4.austin.rr.com>,
Brian Grayson <firstname.lastname@example.org> wrote:
> (This is off-topic, but I have great respect for the knowledge
>of NetBSD users and developers, and I and the sysadmins at work are sort
> At work, we're currently having an NFS problem that
>unfortunately I'm the only one who can demonstrate it (I'm probably
>one of the most annoying^H^H^H^H^H^H^H^Hdemanding users at work!). What
>it appears is, when the fileserver is really overloaded (loads over 50 on
>a 16-processor modern Sun), some scripts appear to have their NFS
>read/getattr/etc. operations time out instead of just hanging until the
>server can handle the request. This causes error messages like being
>unable to change to my home directory, permission denied on my bin/
>directory, etc. I've been able to correlate my failures to entries in
>/var/adm/messages saying "server XXX not responding."
What are the errno's when system calls fail?
> It can also cause file corruption if you use the >> operator --
>Solaris sh does an open() request first, and if that fails, does a
>creat(). If the open() on a _valid_ file fails due to NFS weirdness, it
>ends up trunc'ing the file when it does the creat(). I've seen this
>truncating behavior twice so far on my files.
eww, solaris sh.
> My limited NFS knowledge says, if you have a hard mount, your
>system calls should _NEVER_ fail, they will just take a Really Long Time
>to complete. Am I wrong?
No, you are right.
> Does anyone have any ideas on how to debug this further? I tried
>using nfsstat on the clients, but didn't see much different behavior
>between a machine that seems to have lots of problems, and an older
>machine that doesn't. Since I'm a lowly user at work, I can't even
>log on to the fileserver to dig around there, so I have to rely on
>asking the sysadmins to dig around.