tech-net: Re: Raising NFS parameters for higher bandwith or "long fat pipe"

Subject: Re: Raising NFS parameters for higher bandwith or "long fat pipe"
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Jason Thorpe <thorpej@wasabisystems.com>
List: tech-net
Date: 12/01/2003 16:02:10
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sorry for the delay in responding to this.  A number of factors 
contributing to my workload combined with the holiday weekend pretty 
much made me decide to shut my brain off for a few days.  Doing so was 
quite a relief :-)

On Nov 25, 2003, at 5:37 PM, Jonathan Stone wrote:

> I frequenlty overload NetBSD-current NFS servers in configurations
> where (for example) single servers have several FC controllers, dozens
> of 10,00RPM FC disks, and tens-to-hundreds of client threads banging
> on the disks.  (Think specsfs, for those of you familiar with that.)

Yes, quite familiar (well, with SATA or iSCSI on the back-end instead 
of that legacy FC stuff :-)

> The NFS parameters in -current aren't well-suited to that workload: in
> particular, the compiled-in upper bounds on nfsd threads, and on
> (client-side) readahead, are too low.  I find 64 nfsd is not nearly
> sufficent; I ususally run 128 or 256. I'd also like the amount of
> read-ahead to be sufficent for latencys in the 10ms to 20ms range.

No disagreement from me there.

> I dont' propose to change any of the default values,
> but I would like to raise some compiled-in upper bounds:
>
> usr.sbin/nfsd/nfsd.c: MAXNFSDCNT to 128
> 	      still on the low side, for specsfs runs.

I don't see any good reason to have a compiled-in upper-bound on this 
at all, especially in a userland tool.  I'd say just nuke it.  Only the 
superuser is going to be able to tweak the parameter anyway, so let's 
just assume that the superuser isn't a complete idiot and supply them 
with as much rope as they want.

More comments on nfsd below.

> sys/nfs/nfs.h NFS_MAXRAHEAD from 4 to 32
> 	      (32 requests * 32k reads comes to 1 Mbyte), barely
> 	      enough to fill a 10ms latency at 100 Mbyte/sec.

This seems perfectly OK, too.


Jonathan -- it would be really useful if you were to write up some text 
for the nfsd(8) and mount_nfs(8) manual pages providing some tuning 
advice :-)

> I'm also wondering how much we could gain by turning worker nfds's
> (.e., not the master process which listens for inbound TCP
> connections) into kthreads.  Seems like the same approach used for
> nfsiod's would also work here; and (on some arches at least) switching
> from one kthread to another should not require an mmu context-switch.
> Given a hundred-odd nfsds and a machine not doing much else, the
> savings would add up.

Yah.  What I'd like to see here is an arrangement like:

	1. nfsd does not fork the server worker threads, but only
	   does the rpcbind / listen stuff.

	2. Kernel has "max" and "min" NFS-server-worker-thread
	   parameters.  It starts with "min" when the kernel boots
	   up (default and lower bound of 1?).

	3. As requests come in, if there is not an NFS-server-worker
	   ready to handle the request, then another is created, up
	   to the "max".

	4. As a future enhancement, use the NFS-server-workers in a
	   LIFO fashion, and destroy workers that are idle for some
	   tunable period of time.  Combined with dynamic creation
	   of workers, this allows the system to continuously adapt
	   to the client load.

I'd like to see a similar scheme used for the client-side NFS threads, 
too (nfsiod).

         -- Jason R. Thorpe <thorpej@wasabisystems.com>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iD8DBQE/y9aDOpVKkaBm8XkRAj/xAKDMZz27QakeuAtN+wN6Tw8o/UXCtQCglMja
oxRCMUquEJrKG2oUd1yN1vc=
=2cYz
-----END PGP SIGNATURE-----