Subject: Re: nfsd server usage seems unevenly distributed.
To: Simon Burge <simonb@wasabisystems.com>
From: Stephen M. Jones <smj@cirr.com>
List: tech-net
Date: 01/15/2004 17:47:03
> As I understand it (admittedly at a simple level!), nfsd's are tried in
> order.  If one is busy, then the next on the list is used and so on.

This was my assumption as well, but with the clients complaining about the
server not responding tells me there is some latency there where the 
request isn't being heard by the non-pigged out processes.

 
> >     0 28004 28003   1  -5  0    88    808 biowait  DL   ??  199:01.50 nfsd: ser
> >     0 28005 28003   0   2  0    88    808 nfsd     SL   ??   50:20.35 nfsd: ser
> >     0 28006 28003   0   2  0    88    808 nfsd     SL   ??   16:03.24 nfsd: ser
> >     0 28007 28003   0   2  0    88    808 nfsd     SL   ??    5:22.58 nfsd: ser
> >     0 28008 28003   0   2  0    88    808 nfsd     SL   ??    2:07.91 nfsd: ser
> >     0 28009 28003   0   2  0    88    808 nfsd     SL   ??    0:57.50 nfsd: ser
> >     0 28010 28003   0   2  0    88    808 nfsd     SL   ??    0:28.92 nfsd: ser
..(truncated)..
 
> To me, that says you can get away with about 6 or 8 nfsd's with your
> current load.  I'm sure it doesn't hurt to have more than that, as long
> as you don't have a really small memory of RAM on your box.

I did have 8 running before I bumped it up to 20 a few months ago to 
help with this not responding/responding issue.  memory isn't a problem,
but did you note the biowait on the primary nfsd server?

> Look at the output of netstat -i and look for errors and collisions, and
> ping,etc for dropped packets.  Pretty unlikely on a local 100Mb network
> though, but worth checking...

Name  Mtu   Network       Address          Ipkts Ierrs      Opkts Oerrs Colls
tlp1  1500  <Link>  08:00:2b:86:f7:4b 2239153328    17 4927855769     3     0
tlp1  1500  10      10.0.0.20         2239153328    17 4927855769     3     0

I'd say those errors are mute given the amount of packets transfered.

> I haven't looked under the hood to see if round-robin usage of the
> nfsd's would be easy (or even possible).

Starting around line 730 of nfsd.c no doubt ..  So are all requests
handled by the master nfsd and then those requests are doled out to
the forked servers?