Subject: Re: NFS issues on NetBSD
To: None <current-users@netbsd.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: current-users
Date: 01/23/2003 13:21:30
On Thu, Jan 23, 2003 at 06:54:36PM +0100, Lennart Augustsson wrote:
> >
> >FWIW, I use NetBSD as a NetApp client every day, and haven't seen any
> >suspicious behaviour.
> > 
> >
> I've seen problems with NetBSD and NetApp.  This is especially true for
> TCP mounts, they just hang after a while.  With UDP mounts it gets 
> non-responsive for a while, but then recovers.

FWIW, Panix stopped using TCP mounts with their NetApps (and NetBSD
clients) years ago because of this issue.  Every once in a while I
persuade them to give it a spin again, but eventually have the same 
issue (it's gotten better, but it's not gone away) and give up.

They also have occasional UDP "hangs" like Lennart's discussing, but
I believe that pretty much only happens when the NetApp's doing
snapshots.  Someone from Panix will surely pipe up and correct me
about this if I'm wrong; I know a few of their admins read this
list.

For my own part, I'm chasing some kind of _nasty_ NQNFS problem
that happens with NetBSD clients on a NetBSD server (all -current
as of a week or two ago).  Sometimes the system gets into a state
where if I do this:

client1% touch /share/foo
client2% rm /share/foo
client1% ls /share/foo

I/O on both clients hangs when I do the 'rm' and stays hung until
client1's lease expires.  Neither client shows any "evict" RPCs
received, which as I understand the protocol is just plain wrong.
When client2 does the rm, the server should evict client1; certainly
everything should _not_ hang until client1's lease times out,
particularly not I/O originated by client1 (e.g. the 'ls')!

Any hints would be much appreciated.  NQ is greatly improving our
write performance, so we'd prefer to not turn it off.

Similarly, we haven't seen any TCP mount issues, but if we do I
intend to chase them rather than just switching to UDP, for what
may not be an immediately obvious reason: using UDP defeats hardware
checksum on many gigabit adapters, since most can't checksum UDP
packets that are fragmented on the wire.  We have a few potential
users of this cluster who work with extremely large datasets and 
actually care about I/O performance, which is probably a bit odd
for a compute-cluster application, but there you have it. :-)

-- 
 Thor Lancelot Simon	                                      tls@rek.tjls.com
   But as he knew no bad language, he had called him all the names of common
 objects that he could think of, and had screamed: "You lamp!  You towel!  You
 plate!" and so on.              --Sigmund Freud