Subject: How to get good NFS write performance with RAIDframe
To: Andrea <practive@practive.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: port-i386
Date: 07/17/2001 21:05:38
On Wed, Jul 18, 2001 at 12:25:55AM +0000, Andrea wrote:
> Hi all!
> 
> I'm using a netbsd machine as file server via NFS.
> 
> The storage is a 4 IDE disks raid5(RaidFrame) array FFS formatted +

This is a particularly poor array configuration to use for NFS file
service.  NFS generally has poor write performance and so does RAID5;
the combination is what's largely responsible for your results.

IDE disks are extremely cheap.  If you use larger disks in a mirrored
or stripe/mirror configuration, rather than a RAID5 configuration, you
will get write performance so much better that you will probably not need
to resort to hacks like asynchronous NFS.

The risk you run by using async NFS mounts is that if any client reboots
or crashes unexpectedly file data that your application thought had been
safely written to disk may not have been; also, data-corruption problems
can ensue when two clients write to a file, even if they carefully write
to different regions of the file.

To increase NFS performance, I recommend the following:

1) Unless your network is saturated, increase your NFS I/O size to 32K
   for both read and write, with -r32768 -w32768 as mount options.

2) If the clients are running NetBSD, use the Not Quite NFS extensions to
   the protocol to allow the clients to safely cache writes (-q mount
   option).  Also consider increasing client buffer cache size.

3) If you are willing to re-create the underlying filesystem, I strongly
   suggest using a larger block size and cylinder group size (unless your
   filesystem contains primarily very small files).  This will avoid
   unnecessary seeks on write that really hurt performance on RAID.  Try
   creating the filesystem with -b32768 -f4096 -c1024.  You may need to
   adjust the -c parameter to whatever the maximum newfs will let you have
   is.  Also, see the note below about needing to use a different block
   size and maxcontig for some RAID5 configurations (including yours, if you
   choose to keep the current RAID5 setup) to avoid partial-stripe writes.

These steps should increase your NFS write performance well beyond 
1.7MB/sec; in fact, any one of them may do so by itself.  On the other hand,
so might running RAID 1 (mirroring) or RAID 0+1 (stripe of mirrors) rather
than RAID 5.

With all of the steps listed above, but still running RAID 5, one of my
fileservers gets these results:

| bash-2.05$ cd /vv
| bash-2.05$ ls
| bash-2.05$ dd if=/dev/zero of=test bs=65536 count=1024
| 1024+0 records in
| 1024+0 records out
| 67108864 bytes transferred in 15 secs (4473924 bytes/sec)

Here is the mount configuration on the client I tested from:

| reddwarf.cs.stevens-tech.edu:/vv        /vv     nfs rw,nosuid,nodev,-w32768,-r32768, -i, -q

The server filesystem has 16384 byte blocks, 2048 byte fragments, and cpg 60
(which was the maximum I could get with 16384-byte blocks).  It's a RAID 5 of
four disks, so I set the filesystem blocksize to 16384 and used maxcontig 3
(-a 3 option to newfs or tunefs) to ensure that I would write only 48K at
a time, to avoid paying the RAID5 short-write penalty on every clustered
write (if I'd used 32K blocks, I couldn't have avoided this; this is another
disadvantage of RAID5: it artifically constrains the maximum filesystem block
size if you want decent performance).

Thor