Subject: Re: NFS transport
To: None <tls@rek.tjls.com>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 07/23/2002 12:19:36
Rick, does NetBSD really turn on > 8K reads and writes, by default, for
NFSv2?  That's just ... broken. I assumed the 32k was for NFSv3 only.


 In message <20020723181849.GA23466@rek.tjls.com>Thor Lancelot Simon writes
>On Tue, Jul 23, 2002 at 01:59:53PM -0400, rick@snowhite.cis.uoguelph.ca wrote:

... not that the Penguin-OS is on topic, but in the days of Linux'
userland NFS server, UDP was probably better: it's cheaper to do IP
reassembly and pay the copyout/contexsw cost only once per RPC, rather
than once per TCP segment.  If we have arches where switching to an
nfsd kthread requires a full-blown MMU context-switch, the same may apply.

[...]

>In addition to the inherent win of
>avoiding the reassembly process, you may win because your NIC hardware can
>offload the TCP checksum 

True. OTOH, my experience is that, at least to a first approxmation,
relying on outboard checksumming only makes sense if you don't care
whether the data transferred is correct or not. There are just too
many bugs in too many versions of NIC firmware/hardware/DMA engines,
which sw checksumming catches but outboard checksumming doesn't.  My
guess is this wont change much (if at all) until mass-market OSes
(windows) start to really use IP/TCP/UDP assist.

YMMV on any or all of the above.

> -- even many of those that can offload the UDP
>checksum can't do so for fragmented UDP datagrams, and even for some of the
>ones that can, our drivers can't.

The memory costs of buffering multiple 32k NFS i/os from multiple
clients quickly gets ... exessive. Especially for memory systems capable
of ca.  4 gbits/sec (1 gig fragmented in, one gig reassembled out, and
vice-versa in the other direction).  From that perspective, even the
3com 710024 looks underbuffered. (Anyone know what kind of chipset it
has, or if any of the existing drivers could be mangled to work with it?)

This is one of the gotchas for iSCSI: provide enough buffering to
cope with realistic TCP (or fragmented UDP) drop rates, and the cost
differential between FC shrinks dramatically.