Subject: advice debugging slow NFS
To: None <netbsd-help@netbsd.org>
From: Chris Jones <chris@cjones.org>
List: netbsd-help
Date: 06/04/2003 17:30:17
I know a thing or two about NFS, but I'm not sure where to go to debug 
this problem.  I have a NetBSD 1.6 NFS server which is being used by a 
number of clients running NetBSD, Solaris, and Linux.  On one particular 
Linux machine, I'm getting very bad performance under heavy load.

The linux machine (RedHat 7.1, 2.4.18) seems to think it's getting 
timeouts from the server:

Jun  4 17:16:01 mothra kernel: nfs: server gamera not responding, still 
trying
Jun  4 17:16:04 mothra kernel: nfs: server gamera OK
Jun  4 17:16:14 mothra kernel: nfs: server gamera not responding, still 
trying
Jun  4 17:16:17 mothra kernel: nfs: server gamera OK
Jun  4 17:16:44 mothra kernel: nfs: server gamera not responding, still 
trying
Jun  4 17:16:50 mothra kernel: nfs: server gamera OK
Jun  4 17:17:01 mothra kernel: nfs: server gamera not responding, still 
trying
Jun  4 17:17:03 mothra kernel: nfs: server gamera OK

Here's nfsstat output from the client:

Client rpc stats:
calls      retrans    authrefrsh
4480473    20869      0
Client nfs v2:
null       getattr    setattr    root       lookup     readlink
0       0% 322463  7% 99276   2% 0       0% 2799942 64% 74      0%
read       wrcache    write      create     remove     rename
282297  6% 0       0% 386678  8% 149799  3% 142597  3% 27050   0%
link       symlink    mkdir      rmdir      readdir    fsstat
1369    0% 1       0% 23728   0% 29016   0% 58119   1% 4       0%

...and the server:

Server Info:
RPC Counts: (9877562 calls)
       null         getattr         setattr          lookup          access
          0  0%     2448440 24%      162085  1%     4053079 41% 
599799  6%
   readlink            read           write          create           mkdir
       1524  0%      754165  7%      875915  8%      261489  2% 
49522  0%
    symlink           mknod          remove           rmdir          rename
       1011  0%           0  0%      214657  2%       69273  0% 
58719  0%
       link         readdir     readdirplus          fsstat          fsinfo
       2810  0%      130735  1%        3139  0%      184720  1% 
  70  0%
   pathconf          commit        getlease         vacated         evicted
         20  0%        6389  0%           0  0%           0  0% 
   0  0%
       noop
          1  0%
Server Errors:
RPC errors          faults
     768950               0
Server Cache Stats:
inprogress            idem        non-idem          misses
       8597             171              66         9806553
Server Lease Stats:
     leases       maxleases       getleases
          0               0               0
Server Write Gathering:
     writes       write RPC       OPs saved
     828567          875915           47348  5%

Note that the server has a large number (10%) of RPC errors, and the 
client has a fair number (0.5%) of retransmissions.  I don't see any 
significant errors on the network interfaces (from "netstat -i").

What's the next step in debugging this?

Chris

-- 
Chris Jones               chris@cjones.org                www.cjones.org