Subject: bin/23135: broadcast ypbind sometimes fails to find servers
To: None <gnats-bugs@gnats.netbsd.org>
From: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
List: netbsd-bugs
Date: 10/12/2003 15:38:33
>Number:         23135
>Category:       bin
>Synopsis:       broadcast ypbind sometimes fails to find servers
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 12 13:39:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Manuel Bouyer
>Release:        NetBSD 1.6.1_STABLE
>Organization:
	LIP6/RP
>Environment:
System: NetBSD desssrv 1.6.1_STABLE NetBSD 1.6.1_STABLE (ANTILOQUE) #1: Tue Aug 26 12:26:24 CEST 2003 bouyer@vlaminck:/usr/src/syssrc/sys/arch/i386/compile/ANTILOQUE i386
Architecture: i386
Machine: i386
>Description:
	We have a small student room composed of 24 desktop and a NFS/NIS
	server, all running NetBSD 1.6.1_STABLE. We let ypbind find the server
	from broadcast.
	At boot time, clients sometimes hang on ypbind start, failing
	to find a server. The server is running fine (another client rebooted
	at the same time don't have problems finding the server).
	A tcpdump on the server shows that the client is bradcasting requests,
	but the server doesn't anserws (broadcast address is 132.227.72.63):
15:06:07.893277 dess260-11.65533 > 132.227.72.63.sunrpc: udp 104
0x0000   4500 0084 b637 0000 4011 29f9 84e3 4833        E....7..@.)...H3
0x0010   84e3 483f fffd 006f 0070 a9e3 0000 0000        ..H?...o.p......
0x0020   0000 0000 0000 0002 0001 86a0 0000 0002        ................
0x0030   0000 0005 0000 0001 0000 0024 3f89 51bd        ...........$?.Q.
0x0040   0000 000a 6465 7373 3236 302d 3131 0000        ....dess260-11..
0x0050   0000 0000 0000 0000 0000 0001 0000 0000        ................
0x0060   0000 0000 0000 0000 0001 86a4 0000 0002        ................
0x0070   0000 0002 0000 000c 0000 0007 7465 6c65        ............tele
0x0080   696e 6600                                      inf.
15:06:10.857553 dess260-13.65533 > 132.227.72.63.sunrpc: udp 104
0x0000   4500 0084 b632 0000 4011 29fc 84e3 4835        E....2..@.)...H5
0x0010   84e3 483f fffd 006f 0070 a9dc 0000 0000        ..H?...o.p......
0x0020   0000 0000 0000 0002 0001 86a0 0000 0002        ................
0x0030   0000 0005 0000 0001 0000 0024 3f89 51c0        ...........$?.Q.
0x0040   0000 000a 6465 7373 3236 302d 3133 0000        ....dess260-13..
0x0050   0000 0000 0000 0000 0000 0001 0000 0000        ................
0x0060   0000 0000 0000 0000 0001 86a4 0000 0002        ................
0x0070   0000 0002 0000 000c 0000 0007 7465 6c65        ............tele
0x0080   696e 6600                                      inf.
15:06:13.953387 dess260-11.65533 > 132.227.72.63.sunrpc: udp 104
0x0000   4500 0084 b639 0000 4011 29f7 84e3 4833        E....9..@.)...H3
0x0010   84e3 483f fffd 006f 0070 a9dd 0000 0000        ..H?...o.p......
0x0020   0000 0000 0000 0002 0001 86a0 0000 0002        ................
0x0030   0000 0005 0000 0001 0000 0024 3f89 51c3        ...........$?.Q.
0x0040   0000 000a 6465 7373 3236 302d 3131 0000        ....dess260-11..
0x0050   0000 0000 0000 0000 0000 0001 0000 0000        ................
0x0060   0000 0000 0000 0000 0001 86a4 0000 0002        ................
0x0070   0000 0002 0000 000c 0000 0007 7465 6c65        ............tele
0x0080   696e 6600                                      inf.
15:06:16.917674 dess260-13.65533 > 132.227.72.63.sunrpc: udp 104
0x0000   4500 0084 b634 0000 4011 29fa 84e3 4835        E....4..@.)...H5
0x0010   84e3 483f fffd 006f 0070 a9d6 0000 0000        ..H?...o.p......
0x0020   0000 0000 0000 0002 0001 86a0 0000 0002        ................
0x0030   0000 0005 0000 0001 0000 0024 3f89 51c6        ...........$?.Q.
0x0040   0000 000a 6465 7373 3236 302d 3133 0000        ....dess260-13..
0x0050   0000 0000 0000 0000 0000 0001 0000 0000        ................
0x0060   0000 0000 0000 0000 0001 86a4 0000 0002        ................
0x0070   0000 0002 0000 000c 0000 0007 7465 6c65        ............tele
0x0080   696e 6600                                      inf.

	A ktrace of rpcbind shows that it does get the requests, but it
	seems to just ignore it:

   110 rpcbind  1065963973.953476 RET   poll 1
   110 rpcbind  1065963973.953533 CALL  recvfrom(0x6,0x805d000,0x2000,0,0xbfbfce
60,0xbfbfce5c)
   110 rpcbind  1065963973.953593 GIO   fd 6 read 104 bytes
       "\0\0\0\0\0\0\0\0\0\0\0\^B\0\^A\M^F\240\0\0\0\^B\0\0\0\^E\0\0\0\^A\0\0\
        \0$?\M^IQ\M-C\0\0\0
        dess260-11\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\
        \M^F\M-$\0\0\0\^B\0\0\0\^B\0\0\0\f\0\0\0\ateleinf\0"
   110 rpcbind  1065963973.953639 RET   recvfrom 104/0x68
   110 rpcbind  1065963973.953692 CALL  getsockname(0x6,0xbfbdcd60,0xbfbdcd5c)
   110 rpcbind  1065963973.953742 RET   getsockname 0
   110 rpcbind  1065963973.953777 CALL  getsockopt(0x6,0xffff,0x1008,0xbfbdcd58,
0xbfbdcd5c)
   110 rpcbind  1065963973.953822 RET   getsockopt 0
   110 rpcbind  1065963973.953900 CALL  open(0x480ff287,0,0x1b6)
   110 rpcbind  1065963973.953953 NAMI  "/etc/netconfig"
   110 rpcbind  1065963973.954027 RET   open 3
   110 rpcbind  1065963973.954127 CALL  __fstat13(0x3,0xbfbdcbb0)
   110 rpcbind  1065963973.954175 RET   __fstat13 0
   110 rpcbind  1065963973.954213 CALL  read(0x3,0x8062000,0x2000)
   110 rpcbind  1065963973.954268 GIO   fd 3 read 774 bytes
       "# $NetBSD: netconfig,v 1.1 2000/06/02 22:54:10 fvdl Exp $
        #
        # The network configuration file. This file is currently only used in
        # conjunction with the (TI-) RPC code in the C library, unlike its
        # use in SVR4.
        #
        # Entries consist of:
        #
        #       <network_id> <semantics> <flags> <protofamily> <protoname> \
        \134
        #               <device> <nametoaddr_libs>
        #
        # The <device> and <nametoaddr_libs> fields are always empty in NetBSD\
        .
        #
        udp6       tpi_clts      v     inet6    udp     -       -
        tcp6       tpi_cots_ord  v     inet6    tcp     -       -
        udp        tpi_clts      v     inet     udp     -       -
        tcp        tpi_cots_ord  v     inet     tcp     -       -
        rawip      tpi_raw       -     inet      -      -       -
        local      tpi_cots_ord  -     loopback  -      -       -
       "
   110 rpcbind  1065963973.954329 RET   read 774/0x306
   110 rpcbind  1065963973.954420 CALL  close(0x3)
   110 rpcbind  1065963973.954479 RET   close 0
   110 rpcbind  1065963973.954549 CALL  __sysctl(0xbfbdccd8,0x6,0,0xbfbdccd4,0,0
)
   110 rpcbind  1065963973.954613 RET   __sysctl 0
   110 rpcbind  1065963973.954651 CALL  __sysctl(0xbfbdccd8,0x6,0x805bc00,0xbfbd
ccd4,0,0)
   110 rpcbind  1065963973.954767 RET   __sysctl 0
   110 rpcbind  1065963973.954944 CALL  open(0x480ff287,0,0x1b6)
   110 rpcbind  1065963973.954998 NAMI  "/etc/netconfig"
   110 rpcbind  1065963973.955064 RET   open 3
   110 rpcbind  1065963973.955110 CALL  __fstat13(0x3,0xbfbdcba0)
   110 rpcbind  1065963973.955155 RET   __fstat13 0
   110 rpcbind  1065963973.955191 CALL  read(0x3,0x8062000,0x2000)
   110 rpcbind  1065963973.955240 GIO   fd 3 read 774 bytes
       "# $NetBSD: netconfig,v 1.1 2000/06/02 22:54:10 fvdl Exp $
        #
        # The network configuration file. This file is currently only used in
        # conjunction with the (TI-) RPC code in the C library, unlike its
        # use in SVR4.
        #
        # Entries consist of:
        #
        #       <network_id> <semantics> <flags> <protofamily> <protoname> \
        \134
        #               <device> <nametoaddr_libs>
        #
        # The <device> and <nametoaddr_libs> fields are always empty in NetBSD\
        .
        #
        udp6       tpi_clts      v     inet6    udp     -       -
        tcp6       tpi_cots_ord  v     inet6    tcp     -       -
        udp        tpi_clts      v     inet     udp     -       -
        tcp        tpi_cots_ord  v     inet     tcp     -       -
        rawip      tpi_raw       -     inet      -      -       -
        local      tpi_cots_ord  -     loopback  -      -       -
       "
   110 rpcbind  1065963973.955306 RET   read 774/0x306
   110 rpcbind  1065963973.955392 CALL  close(0x3)
   110 rpcbind  1065963973.955451 RET   close 0
   110 rpcbind  1065963973.955512 CALL  __sysctl(0xbfbdccc8,0x6,0,0xbfbdccc4,0,0
)
   110 rpcbind  1065963973.955569 RET   __sysctl 0
   110 rpcbind  1065963973.955607 CALL  __sysctl(0xbfbdccc8,0x6,0x805bc00,0xbfbd
ccc4,0,0)
   110 rpcbind  1065963973.955708 RET   __sysctl 0
   110 rpcbind  1065963973.955978 CALL  gettimeofday(0xbfbdcd88,0)
   110 rpcbind  1065963973.956034 RET   gettimeofday 0
   110 rpcbind  1065963973.956075 CALL  gettimeofday(0xbfbdcd88,0)
   110 rpcbind  1065963973.956115 RET   gettimeofday 0
   110 rpcbind  1065963973.956155 CALL  poll(0xbfbfd4a0,0x4,0x7530)
   110 rpcbind  1065963975.100258 RET   poll 1
   110 rpcbind  1065963975.100399 CALL  recvfrom(0x6,0x805d000,0x2000,0,0xbfbfce
60,0xbfbfce5c)
   110 rpcbind  1065963975.100478 GIO   fd 6 read 104 bytes
       "\0\0\0\0\0\0\0\0\0\0\0\^B\0\^A\M^F\240\0\0\0\^B\0\0\0\^E\0\0\0\^A\0\0\
        \0$?\M^IQ\M-G\0\0\0
        dess264-04\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\
        \M^F\M-$\0\0\0\^B\0\0\0\^B\0\0\0\f\0\0\0\ateleinf\0"


	I don't know why rpcbind ignores the request, I guess something
	in the packet he did get. ethereal doens't show any difference
	between a request which gets handed, and one which gets ignored.
	I don't know well enouth the internals of RPC to be able to
	decode the packet and see why it would fail.
	
>How-To-Repeat:
	Not easy to reproduce on demand. rebooting a client will hang once
	in about 20 reboots.
>Fix:
	workaround: list servers in /var/yp/binding/<domain>.ypservers
>Release-Note:
>Audit-Trail:
>Unformatted: