Subject: NFS server problems on sparc/sparc64?
To: None <port-sparc64@NetBSD.org, netbsd-users@NetBSD.org,>
From: John D.Baker <jdbaker@mylinuxisp.com>
List: port-sparc64
Date: 11/27/2007 18:12:01
I've been running NetBSD-4.0_RC? on my file servers for some time and
have been seeing lots of "nfs send error 32" messages on the console,
clients reporting streams of "nfs server 'foo' not responding", nfs 
server
'foo' is alive again.

After a while, all NFS operations cease.  If I do '/etc/rc.d/nfs 
restart'
it reports all the nfsd process PIDs its waiting to terminate, restarts
nfs service and the stalled clients pick up where they left off.

Occasionally (and more frequently if the load is heavy enough), the
server will incur a "Watchdog Reset" and be dropped to the OpenBoot
prompt.

"halloran"
    SS5-85MHz, 128MB, hme, ISP wide single-ended, 4GB system disk (esp0)
    2 18GB SCSI drives in RAIDframe RAID 1 (isp0)

"deepdish"
    Ultra5-333MHz, 256MB, Adaptec AHC3940UW (ahc), 8GB system disk
    (cmdide0/wd0), 6 4GB SCSI drives in RAIDframe RAID-R (RAID 5
    w/rotated spare) (ahc0).

Both behave the same way, although "halloran" used to never complain
even under heavy load.

Actually, I can't confirm Watchdog Resets on "deepdish" since there
wasn't any terminal attached when it crashed.  On a couple of occasions,
it has been so thoroughly wedged, it required a power-cycle regain
control of the machine.

At present, all clients automount filesystem from the servers via
amd.  Setting the rsize/wsize to something that would fit in a single
Ethernet frame was tried, but did not change behavior.

/usr/include/sys/errno.h says "32" is EPIPE, broken pipe.  Not sure
what that means in the context of NFS over UDP though.

At present, all but one client are peecee machines running
NetBSD-4.0RC_4/i386.  The one other is an SS20 running
NetBSD-4.0_RC4/sparc.

The i386 machines also export their filesystems over NFS and I have
never seen any behavior such as I have described from them.  They
do not have the overhead of RAIDframe, though.

Thanks.
--
John D. Baker                            NetBSD     Darwin/MacOS X
http://mylinuxisp(dot)com/(tilde)jdbaker/     OpenBSD            FreeBSD
BSD.  It just sits there and _works_.
GPG fingerprint = D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645