Subject: RPC bug in ypbind ?
To: None <tech-userlevel@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-userlevel
Date: 09/02/2003 18:28:51
Hi,
I think I found a bug in ypbind in how svc_maxfd/svc_fdset is used.
But I'm not a TCP expert, so I may be wrong.

Occasionally ypbind hang (problem also reported by manu@). ps shows that
ypbind is blocked on select(), netstat shows that there are some TCP
connections with data in the receive queue:
desssrv:/var/crash#ps -axlww -M netbsd.1.core |grep ypbind
  0   118 -1077951044   0   2   0 3428   0 select   Ts   ?? 0:00.00 /usr/sbin/ypbind 
desssrv:/var/crash#netstat -f inet -M netbsd.1.core 
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        State
tcp       56      0  localhost.1021         localhost.879          CLOSE_WAIT
tcp        0      0  localhost.879          localhost.1021         FIN_WAIT_2
tcp        0      0  desssrv.ssh            sphinx.1017            ESTABLISHED
tcp       56      0  localhost.1021         localhost.922          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.923          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.924          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.925          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.926          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.927          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.928          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.929          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.930          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.931          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.932          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.933          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.934          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.935          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.936          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.937          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.938          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.939          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.940          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.941          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.942          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.943          CLOSE_WAIT
tcp       56      0  localhost.1021         localhost.944          CLOSE_WAIT
tcp        0      0  desssrv.ssh            bacchus.891            ESTABLISHED
tcp        0      0  desssrv.ssh            bacchus.892            ESTABLISHED
tcp        0      0  desssrv.ssh            bacchus.894            ESTABLISHED
tcp        0      0  localhost.domain       *.*                    LISTEN
tcp        0      0  desssrv264.domain      *.*                    LISTEN
tcp        0      0  desssrv.domain         *.*                    LISTEN
tcp        0      0  desssrv260.domain      *.*                    LISTEN
udp        0      0  localhost.ntp          *.*                   
udp        0      0  desssrv264.ntp         *.*                   
udp        0      0  desssrv.ntp            *.*                   
udp        0      0  desssrv260.ntp         *.*                   
udp        0      0  localhost.domain       *.*                   
udp        0      0  desssrv264.domain      *.*                   
udp        0      0  desssrv.domain         *.*                   
udp        0      0  desssrv260.domain      *.*                   
desssrv:/var/crash#fstat -M netbsd.1.core |grep ypbind
root     ypbind       118   wd -         -        none    -
root     ypbind       118    0 -         -        none    -
root     ypbind       118    1 -         -        none    -
root     ypbind       118    2 -         -        none    -
root     ypbind       118    3 -         -        none    -
root     ypbind       118    4* internet dgram udp c05ce5a0 *:1020
root     ypbind       118    5* internet stream tcp c06f9768 *:1021
root     ypbind       118    6* internet dgram udp c05ce660 *:65532
root     ypbind       118    7* internet dgram udp c05ce6c0 *:65529
root     ypbind       118    8 -         -        none    -
root     ypbind       118    9* internet stream tcp c0764b20 127.0.0.1:1021 <-> 127.0.0.1:944
root     ypbind       118   10* internet stream tcp c07643b8 127.0.0.1:1021 <-> 127.0.0.1:943
root     ypbind       118   11* internet stream tcp c0764140 127.0.0.1:1021 <-> 127.0.0.1:942
root     ypbind       118   12* internet stream tcp c076427c 127.0.0.1:1021 <-> 127.0.0.1:941
root     ypbind       118   13* internet stream tcp c07894f8 127.0.0.1:1021 <-> 127.0.0.1:940
root     ypbind       118   14* internet stream tcp c0789144 127.0.0.1:1021 <-> 127.0.0.1:939
root     ypbind       118   15* internet stream tcp c0789280 127.0.0.1:1021 <-> 127.0.0.1:938
root     ypbind       118   16* internet stream tcp c07898ac 127.0.0.1:1021 <-> 127.0.0.1:937
root     ypbind       118   17* internet stream tcp c0789b24 127.0.0.1:1021 <-> 127.0.0.1:936
root     ypbind       118   18* internet stream tcp c0789d9c 127.0.0.1:1021 <-> 127.0.0.1:935
root     ypbind       118   19* internet stream tcp c07b2148 127.0.0.1:1021 <-> 127.0.0.1:934
root     ypbind       118   20* internet stream tcp c07b23c0 127.0.0.1:1021 <-> 127.0.0.1:933
root     ypbind       118   21* internet stream tcp c07b2638 127.0.0.1:1021 <-> 127.0.0.1:932
root     ypbind       118   22* internet stream tcp c07b28b0 127.0.0.1:1021 <-> 127.0.0.1:931
root     ypbind       118   23* internet stream tcp c07b2b28 127.0.0.1:1021 <-> 127.0.0.1:930
root     ypbind       118   24* internet stream tcp c07b2da0 127.0.0.1:1021 <-> 127.0.0.1:929
root     ypbind       118   25* internet stream tcp c07b314c 127.0.0.1:1021 <-> 127.0.0.1:928
root     ypbind       118   26* internet stream tcp c07b33c4 127.0.0.1:1021 <-> 127.0.0.1:927
root     ypbind       118   27* internet stream tcp c07b363c 127.0.0.1:1021 <-> 127.0.0.1:926
root     ypbind       118   28* internet stream tcp c07b38b4 127.0.0.1:1021 <-> 127.0.0.1:925
root     ypbind       118   29* internet stream tcp c07b3b2c 127.0.0.1:1021 <-> 127.0.0.1:924
root     ypbind       118   30* internet stream tcp c07b3da4 127.0.0.1:1021 <-> 127.0.0.1:923
root     ypbind       118   31* internet stream tcp c07b5150 127.0.0.1:1021 <-> 127.0.0.1:922
root     ypbind       118   33* internet stream tcp c07b3500 127.0.0.1:1021 <-> 127.0.0.1:879
root     ypbind       118   34* internet stream tcp


Investigating further, I found that ypbind is select()ing only on the first
7 file descriptors (nfds passed to select is 8).

Here is where I think the bug is:
ypbind implements its own select() loop, using svc_fdset and svc_getreqset()
to process RPC requests where there are some pending. However, width
(the nfds passed to select()) is computed before the loop (see ypbind.c
starting at line 603).
As I understand it, svc_getreqset() may accept TCP connections for incoming
RPC requests. Doing this, the RPC library will add the new TCP socket
to svc_fdset, and will adjust svc_maxfd. So we need to compute width
inside the loop, at last after each call to svc_getreqset().
Did I miss something ?

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
     NetBSD: 24 ans d'experience feront toujours la difference
--