Subject: RPC bug in ypbind ?
To: None <tech-userlevel@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-userlevel
Date: 09/02/2003 18:28:51
Hi,
I think I found a bug in ypbind in how svc_maxfd/svc_fdset is used.
But I'm not a TCP expert, so I may be wrong.
Occasionally ypbind hang (problem also reported by manu@). ps shows that
ypbind is blocked on select(), netstat shows that there are some TCP
connections with data in the receive queue:
desssrv:/var/crash#ps -axlww -M netbsd.1.core |grep ypbind
0 118 -1077951044 0 2 0 3428 0 select Ts ?? 0:00.00 /usr/sbin/ypbind
desssrv:/var/crash#netstat -f inet -M netbsd.1.core
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 56 0 localhost.1021 localhost.879 CLOSE_WAIT
tcp 0 0 localhost.879 localhost.1021 FIN_WAIT_2
tcp 0 0 desssrv.ssh sphinx.1017 ESTABLISHED
tcp 56 0 localhost.1021 localhost.922 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.923 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.924 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.925 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.926 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.927 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.928 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.929 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.930 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.931 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.932 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.933 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.934 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.935 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.936 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.937 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.938 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.939 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.940 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.941 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.942 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.943 CLOSE_WAIT
tcp 56 0 localhost.1021 localhost.944 CLOSE_WAIT
tcp 0 0 desssrv.ssh bacchus.891 ESTABLISHED
tcp 0 0 desssrv.ssh bacchus.892 ESTABLISHED
tcp 0 0 desssrv.ssh bacchus.894 ESTABLISHED
tcp 0 0 localhost.domain *.* LISTEN
tcp 0 0 desssrv264.domain *.* LISTEN
tcp 0 0 desssrv.domain *.* LISTEN
tcp 0 0 desssrv260.domain *.* LISTEN
udp 0 0 localhost.ntp *.*
udp 0 0 desssrv264.ntp *.*
udp 0 0 desssrv.ntp *.*
udp 0 0 desssrv260.ntp *.*
udp 0 0 localhost.domain *.*
udp 0 0 desssrv264.domain *.*
udp 0 0 desssrv.domain *.*
udp 0 0 desssrv260.domain *.*
desssrv:/var/crash#fstat -M netbsd.1.core |grep ypbind
root ypbind 118 wd - - none -
root ypbind 118 0 - - none -
root ypbind 118 1 - - none -
root ypbind 118 2 - - none -
root ypbind 118 3 - - none -
root ypbind 118 4* internet dgram udp c05ce5a0 *:1020
root ypbind 118 5* internet stream tcp c06f9768 *:1021
root ypbind 118 6* internet dgram udp c05ce660 *:65532
root ypbind 118 7* internet dgram udp c05ce6c0 *:65529
root ypbind 118 8 - - none -
root ypbind 118 9* internet stream tcp c0764b20 127.0.0.1:1021 <-> 127.0.0.1:944
root ypbind 118 10* internet stream tcp c07643b8 127.0.0.1:1021 <-> 127.0.0.1:943
root ypbind 118 11* internet stream tcp c0764140 127.0.0.1:1021 <-> 127.0.0.1:942
root ypbind 118 12* internet stream tcp c076427c 127.0.0.1:1021 <-> 127.0.0.1:941
root ypbind 118 13* internet stream tcp c07894f8 127.0.0.1:1021 <-> 127.0.0.1:940
root ypbind 118 14* internet stream tcp c0789144 127.0.0.1:1021 <-> 127.0.0.1:939
root ypbind 118 15* internet stream tcp c0789280 127.0.0.1:1021 <-> 127.0.0.1:938
root ypbind 118 16* internet stream tcp c07898ac 127.0.0.1:1021 <-> 127.0.0.1:937
root ypbind 118 17* internet stream tcp c0789b24 127.0.0.1:1021 <-> 127.0.0.1:936
root ypbind 118 18* internet stream tcp c0789d9c 127.0.0.1:1021 <-> 127.0.0.1:935
root ypbind 118 19* internet stream tcp c07b2148 127.0.0.1:1021 <-> 127.0.0.1:934
root ypbind 118 20* internet stream tcp c07b23c0 127.0.0.1:1021 <-> 127.0.0.1:933
root ypbind 118 21* internet stream tcp c07b2638 127.0.0.1:1021 <-> 127.0.0.1:932
root ypbind 118 22* internet stream tcp c07b28b0 127.0.0.1:1021 <-> 127.0.0.1:931
root ypbind 118 23* internet stream tcp c07b2b28 127.0.0.1:1021 <-> 127.0.0.1:930
root ypbind 118 24* internet stream tcp c07b2da0 127.0.0.1:1021 <-> 127.0.0.1:929
root ypbind 118 25* internet stream tcp c07b314c 127.0.0.1:1021 <-> 127.0.0.1:928
root ypbind 118 26* internet stream tcp c07b33c4 127.0.0.1:1021 <-> 127.0.0.1:927
root ypbind 118 27* internet stream tcp c07b363c 127.0.0.1:1021 <-> 127.0.0.1:926
root ypbind 118 28* internet stream tcp c07b38b4 127.0.0.1:1021 <-> 127.0.0.1:925
root ypbind 118 29* internet stream tcp c07b3b2c 127.0.0.1:1021 <-> 127.0.0.1:924
root ypbind 118 30* internet stream tcp c07b3da4 127.0.0.1:1021 <-> 127.0.0.1:923
root ypbind 118 31* internet stream tcp c07b5150 127.0.0.1:1021 <-> 127.0.0.1:922
root ypbind 118 33* internet stream tcp c07b3500 127.0.0.1:1021 <-> 127.0.0.1:879
root ypbind 118 34* internet stream tcp
Investigating further, I found that ypbind is select()ing only on the first
7 file descriptors (nfds passed to select is 8).
Here is where I think the bug is:
ypbind implements its own select() loop, using svc_fdset and svc_getreqset()
to process RPC requests where there are some pending. However, width
(the nfds passed to select()) is computed before the loop (see ypbind.c
starting at line 603).
As I understand it, svc_getreqset() may accept TCP connections for incoming
RPC requests. Doing this, the RPC library will add the new TCP socket
to svc_fdset, and will adjust svc_maxfd. So we need to compute width
inside the loop, at last after each call to svc_getreqset().
Did I miss something ?
--
Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
NetBSD: 24 ans d'experience feront toujours la difference
--