Subject: Re: [HACKERS] PostgreSQL, NetBSD and NFS
To: None <pgsql-hackers@postgresql.org, current-users@netbsd.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: current-users
Date: 02/05/2003 15:31:49
On Wed, Feb 05, 2003 at 03:09:09PM -0500, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > On Wednesday 05 February 2003 13:04, Ian Fry wrote:
> >> How about adjusting the read and write-size used by the NetBSD machine? I
> >> think the default is 32k for both read and write on i386 machines now.
> >> Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs,
> >> IIRC)
> 
> > Hey!  That did it.
> 
> Hot diggety!
> 
> > So, why does this fix it?

Who knows.  One thing that I'd be interested to know is whether Darcy is
using NFSv2 or NFSv3 -- 32k requests are not, strictly speaking, within
the bounds of the v2 specification.  If he is using UDP rather than TCP
as the transport layer, another potential issue is that 32K requests will
end up as IP packets with a very large number of fragments, potentially
exposing some kind of network stack bug in which the last fragment is
dropped or corrupted (I would suspect that the likelihood of such a bug
in the NetApp stack is quite low, however).  If feasible, it is probably
better to use TCP as the transport and let it handle segmentation whether
the request size is 8K or 32K.

> I think now you file a bug report with the NetBSD kernel folk.  My
> thoughts are running in the direction of a bug having to do with
> scattering a 32K read into multiple kernel disk-cache buffers or
> gathering together multiple cache buffer contents to form a 32K write.

That doesn't make much sense to me.  Pages on i386 are 4K, so whether he
does 8K writes or 32K writes, it will always come from multiple pages in
the pagecache.

> Unless NetBSD has changed from its heritage, the kernel disk cache
> buffers are 8K, and so an 8K NFS read or write would never cross a
> cache buffer boundary.  But 32K would.

I don't know what "heritage" you're referring to, but it has never been
the case that NetBSD's buffer cache has used fixed-size 8K disk buffers,
and I don't believe that it was ever the case for any Net2 or 4.4-derived
system.

> Or it could be a similar bug on the NFS server's side?

That's concievable.  Of course, a client bug is quite possible, as well,
but I don't think the mechanism you suggest is likely.

-- 
 Thor Lancelot Simon	                                      tls@rek.tjls.com
   But as he knew no bad language, he had called him all the names of common
 objects that he could think of, and had screamed: "You lamp!  You towel!  You
 plate!" and so on.              --Sigmund Freud