Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NFS client renders system unusable



Michael van Elst wrote:
On Sat, Mar 15, 2008 at 5:39 AM, Sarton O'Brien 
<bsd-xen%roguewrt.org@localhost> wrote:
 >> yp_order: clnt_call: RPC: Unable to send; errno = No buffer space
 >> available
I would guess that when the NFS connection stalls his application
just piles up connections. The number of open but waiting connection
grows beyond bounds and this is eating the (network) buffer space so
that finally even the single NFS connection fails due to lack
of memory.

The answer is to limit the number of open connections in his
application or if there is already a limit, to provide the
buffer resources necessary. With NetBSD this is the
kern.mbuf.nmbclusters value and maybe vm.nkmempages too

kern.mbuf.nmbclusters has stopped the problem producing the error above. The system at least now responds but nfs still eventually locks up:

# /etc/rc.d/mldonkey stop
Stopping mldonkey.

Waiting for PIDS: 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904nfs server 192.168.15.8:/usr/local/svc/mldonkey: not responding , 904, 904, 904nfs server 192.168.15.8:/usr/local/svc/mldonkey: is alive again , 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904, 904

And it's highly unlikely it will ever actually terminate without a -9.


# ps axsw
UID PID PPID CPU LID NLWP PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 0 0 0 0 29 26 96 0 0 15436 - ZWL- ? 0:18.28 [system] 0 0 0 0 26 26 96 0 0 15436 netio DWL- ? 0:18.28 [system] 0 0 0 0 25 26 96 0 0 15436 nfsrcv DWL- ? 0:18.28 [system] 0 0 0 0 24 26 96 0 0 15436 nfsrcv DWL- ? 0:18.28 [system] 0 0 0 0 23 26 96 0 0 15436 nfsiod DWL- ? 0:18.28 [system] 0 0 0 0 22 26 96 0 0 15436 nfsiod DWL- ? 0:18.28 [system] 0 0 0 0 21 26 96 0 0 15436 nfsiod DWL- ? 0:18.28 [system] 0 0 0 0 20 26 96 0 0 15436 nfsiod DWL- ? 0:18.28 [system] 0 0 0 0 19 26 96 0 0 15436 nfsrcv DWL- ? 0:18.28 [system] 0 0 0 0 18 26 123 0 0 15436 physiod DWL- ? 0:18.28 [system] 0 0 0 0 17 26 125 0 0 15436 vmem_reh DWL- ? 0:18.28 [system] 0 0 0 0 16 26 125 0 0 15436 aiodoned DWL- ? 0:18.28 [system] 0 0 0 0 15 26 124 0 0 15436 syncer DWL- ? 0:18.28 [system] 0 0 0 0 14 26 126 0 0 15436 pgdaemon DWL- ? 0:18.28 [system] 0 0 0 0 13 26 96 0 0 15436 rdst DWL- ? 0:18.28 [system] 0 0 0 0 12 26 96 0 0 15436 evtsq DWL- ? 0:18.28 [system] 0 0 0 0 11 26 96 0 0 15436 crypto_w DWL- ? 0:18.28 [system] 0 0 0 0 9 26 96 0 0 15436 pmfevent DWL- ? 0:18.28 [system] 0 0 0 0 8 26 125 0 0 15436 vrele DWL- ? 0:18.28 [system] 0 0 0 0 7 26 127 0 0 15436 xcall DWL- ? 0:18.28 [system] 0 0 0 0 6 26 223 0 0 15436 - RWL- ? 0:18.28 [system] 0 0 0 0 5 26 220 0 0 15436 - RWL- ? 0:18.28 [system] 0 0 0 0 4 26 221 0 0 15436 - RWL- ? 0:18.28 [system] 0 0 0 0 3 26 222 0 0 15436 - RWL- ? 0:18.28 [system] 0 0 0 0 2 26 0 0 0 15436 - RWL- ? 0:18.28 [system] 0 0 0 0 1 26 125 0 0 15436 schedule DW ? 0:18.28 [system] 0 1 0 0 1 1 85 0 748 800 wait DW ? 0:00.01 init 0 102 1 0 1 1 85 0 752 944 select DW ? 0:00.51 /usr/sbin/ypbind 0 106 1 0 1 1 85 0 756 536 kqread DW ? 0:00.03 /usr/sbin/syslogd -s 0 131 1 0 1 1 85 0 876 256 select DW ? 0:00.02 /usr/sbin/rpcbind -l 0 189 1 0 1 1 85 0 756 260 select DW ? 0:00.02 /usr/sbin/rpc.statd 0 201 1 0 1 1 85 0 756 272 select DW ? 0:00.02 /usr/sbin/rpc.lockd 0 247 1 0 1 1 85 0 1780 5524 pause DW ? 0:00.75 /usr/sbin/ntpd 0 250 1 489 1 1 85 0 744 4 kqread DW ? 0:00.00 /usr/sbin/powerd 0 270 1 0 1 1 85 0 764 4 select DW ? 0:00.00 /usr/sbin/sshd 0 379 1 0 1 1 85 0 752 900 kqread DW ? 0:00.06 /usr/libexec/postfix/master 12 400 379 0 1 1 85 0 752 676 kqread DW ? 0:00.07 qmgr -l -t unix -u 0 414 1 0 1 1 85 0 752 484 nanoslp DW ? 0:00.03 /usr/sbin/cron 0 459 270 0 1 1 85 0 824 1556 netio DW ? 0:00.07 sshd: roguetr [priv] 1000 469 459 0 1 1 85 0 824 2752 select DW ? 0:00.87 sshd: roguetr@ttyp0 12 636 379 0 1 1 85 0 752 1252 kqread DW ? 0:00.02 pickup -l -t fifo -u 0 377 463 0 1 1 85 0 748 876 pause DW ttyp0 0:00.03 -ksh 0 463 468 0 1 1 86 -2 756 4 wait DW ttyp0 0:00.02 su - 1000 468 469 0 1 1 85 0 748 4 pause DW ttyp0 0:00.01 -ksh 1006 904 1 0 2 2 24 19 7904 37920 parked DW- ttyp0 0:05.07 /usr/pkg/libexec/mldonkey/mlnet -pid /var/run/mldonkey 1006 904 1 0 1 2 76 19 7904 37920 nfsrcv DW ttyp0 0:05.07 /usr/pkg/libexec/mldonkey/mlnet -pid /var/run/mldonkey 0 3640 377 0 1 1 85 0 748 1108 wait DW ttyp0 0:00.06 /bin/sh /etc/rc.d/mldonkey stop 0 5031 3640 0 1 1 85 0 20 724 nanoslp DW ttyp0 0:00.00 sleep 2 0 354 1 7 1 1 85 0 756 2064 wait DW xencons 0:00.03 login 0 3653 354 369 1 1 85 0 748 1100 pause DW xencons 0:00.04 -ksh 0 5055 3653 369 1 1 43 0 748 792 - RW xencons 0:00.00 ps -axsw

The problem seems to occur when more files are added to the list that is maintained by mldonkey, which are subsequently managed over nfs by the program. The program obviously wasn't designed for this use but do you have any idea what I might need to do to accomodate it? I have increased vfs.nfs.iothreads

Thanks for your help so far.

Sarton


Home | Main Index | Thread Index | Old Index