Subject: bin/23135: broadcast ypbind sometimes fails to find servers
To: None <gnats-bugs@gnats.netbsd.org>
From: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
List: netbsd-bugs
Date: 10/12/2003 15:38:33
>Number: 23135
>Category: bin
>Synopsis: broadcast ypbind sometimes fails to find servers
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Oct 12 13:39:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Manuel Bouyer
>Release: NetBSD 1.6.1_STABLE
>Organization:
LIP6/RP
>Environment:
System: NetBSD desssrv 1.6.1_STABLE NetBSD 1.6.1_STABLE (ANTILOQUE) #1: Tue Aug 26 12:26:24 CEST 2003 bouyer@vlaminck:/usr/src/syssrc/sys/arch/i386/compile/ANTILOQUE i386
Architecture: i386
Machine: i386
>Description:
We have a small student room composed of 24 desktop and a NFS/NIS
server, all running NetBSD 1.6.1_STABLE. We let ypbind find the server
from broadcast.
At boot time, clients sometimes hang on ypbind start, failing
to find a server. The server is running fine (another client rebooted
at the same time don't have problems finding the server).
A tcpdump on the server shows that the client is bradcasting requests,
but the server doesn't anserws (broadcast address is 132.227.72.63):
15:06:07.893277 dess260-11.65533 > 132.227.72.63.sunrpc: udp 104
0x0000 4500 0084 b637 0000 4011 29f9 84e3 4833 E....7..@.)...H3
0x0010 84e3 483f fffd 006f 0070 a9e3 0000 0000 ..H?...o.p......
0x0020 0000 0000 0000 0002 0001 86a0 0000 0002 ................
0x0030 0000 0005 0000 0001 0000 0024 3f89 51bd ...........$?.Q.
0x0040 0000 000a 6465 7373 3236 302d 3131 0000 ....dess260-11..
0x0050 0000 0000 0000 0000 0000 0001 0000 0000 ................
0x0060 0000 0000 0000 0000 0001 86a4 0000 0002 ................
0x0070 0000 0002 0000 000c 0000 0007 7465 6c65 ............tele
0x0080 696e 6600 inf.
15:06:10.857553 dess260-13.65533 > 132.227.72.63.sunrpc: udp 104
0x0000 4500 0084 b632 0000 4011 29fc 84e3 4835 E....2..@.)...H5
0x0010 84e3 483f fffd 006f 0070 a9dc 0000 0000 ..H?...o.p......
0x0020 0000 0000 0000 0002 0001 86a0 0000 0002 ................
0x0030 0000 0005 0000 0001 0000 0024 3f89 51c0 ...........$?.Q.
0x0040 0000 000a 6465 7373 3236 302d 3133 0000 ....dess260-13..
0x0050 0000 0000 0000 0000 0000 0001 0000 0000 ................
0x0060 0000 0000 0000 0000 0001 86a4 0000 0002 ................
0x0070 0000 0002 0000 000c 0000 0007 7465 6c65 ............tele
0x0080 696e 6600 inf.
15:06:13.953387 dess260-11.65533 > 132.227.72.63.sunrpc: udp 104
0x0000 4500 0084 b639 0000 4011 29f7 84e3 4833 E....9..@.)...H3
0x0010 84e3 483f fffd 006f 0070 a9dd 0000 0000 ..H?...o.p......
0x0020 0000 0000 0000 0002 0001 86a0 0000 0002 ................
0x0030 0000 0005 0000 0001 0000 0024 3f89 51c3 ...........$?.Q.
0x0040 0000 000a 6465 7373 3236 302d 3131 0000 ....dess260-11..
0x0050 0000 0000 0000 0000 0000 0001 0000 0000 ................
0x0060 0000 0000 0000 0000 0001 86a4 0000 0002 ................
0x0070 0000 0002 0000 000c 0000 0007 7465 6c65 ............tele
0x0080 696e 6600 inf.
15:06:16.917674 dess260-13.65533 > 132.227.72.63.sunrpc: udp 104
0x0000 4500 0084 b634 0000 4011 29fa 84e3 4835 E....4..@.)...H5
0x0010 84e3 483f fffd 006f 0070 a9d6 0000 0000 ..H?...o.p......
0x0020 0000 0000 0000 0002 0001 86a0 0000 0002 ................
0x0030 0000 0005 0000 0001 0000 0024 3f89 51c6 ...........$?.Q.
0x0040 0000 000a 6465 7373 3236 302d 3133 0000 ....dess260-13..
0x0050 0000 0000 0000 0000 0000 0001 0000 0000 ................
0x0060 0000 0000 0000 0000 0001 86a4 0000 0002 ................
0x0070 0000 0002 0000 000c 0000 0007 7465 6c65 ............tele
0x0080 696e 6600 inf.
A ktrace of rpcbind shows that it does get the requests, but it
seems to just ignore it:
110 rpcbind 1065963973.953476 RET poll 1
110 rpcbind 1065963973.953533 CALL recvfrom(0x6,0x805d000,0x2000,0,0xbfbfce
60,0xbfbfce5c)
110 rpcbind 1065963973.953593 GIO fd 6 read 104 bytes
"\0\0\0\0\0\0\0\0\0\0\0\^B\0\^A\M^F\240\0\0\0\^B\0\0\0\^E\0\0\0\^A\0\0\
\0$?\M^IQ\M-C\0\0\0
dess260-11\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\
\M^F\M-$\0\0\0\^B\0\0\0\^B\0\0\0\f\0\0\0\ateleinf\0"
110 rpcbind 1065963973.953639 RET recvfrom 104/0x68
110 rpcbind 1065963973.953692 CALL getsockname(0x6,0xbfbdcd60,0xbfbdcd5c)
110 rpcbind 1065963973.953742 RET getsockname 0
110 rpcbind 1065963973.953777 CALL getsockopt(0x6,0xffff,0x1008,0xbfbdcd58,
0xbfbdcd5c)
110 rpcbind 1065963973.953822 RET getsockopt 0
110 rpcbind 1065963973.953900 CALL open(0x480ff287,0,0x1b6)
110 rpcbind 1065963973.953953 NAMI "/etc/netconfig"
110 rpcbind 1065963973.954027 RET open 3
110 rpcbind 1065963973.954127 CALL __fstat13(0x3,0xbfbdcbb0)
110 rpcbind 1065963973.954175 RET __fstat13 0
110 rpcbind 1065963973.954213 CALL read(0x3,0x8062000,0x2000)
110 rpcbind 1065963973.954268 GIO fd 3 read 774 bytes
"# $NetBSD: netconfig,v 1.1 2000/06/02 22:54:10 fvdl Exp $
#
# The network configuration file. This file is currently only used in
# conjunction with the (TI-) RPC code in the C library, unlike its
# use in SVR4.
#
# Entries consist of:
#
# <network_id> <semantics> <flags> <protofamily> <protoname> \
\134
# <device> <nametoaddr_libs>
#
# The <device> and <nametoaddr_libs> fields are always empty in NetBSD\
.
#
udp6 tpi_clts v inet6 udp - -
tcp6 tpi_cots_ord v inet6 tcp - -
udp tpi_clts v inet udp - -
tcp tpi_cots_ord v inet tcp - -
rawip tpi_raw - inet - - -
local tpi_cots_ord - loopback - - -
"
110 rpcbind 1065963973.954329 RET read 774/0x306
110 rpcbind 1065963973.954420 CALL close(0x3)
110 rpcbind 1065963973.954479 RET close 0
110 rpcbind 1065963973.954549 CALL __sysctl(0xbfbdccd8,0x6,0,0xbfbdccd4,0,0
)
110 rpcbind 1065963973.954613 RET __sysctl 0
110 rpcbind 1065963973.954651 CALL __sysctl(0xbfbdccd8,0x6,0x805bc00,0xbfbd
ccd4,0,0)
110 rpcbind 1065963973.954767 RET __sysctl 0
110 rpcbind 1065963973.954944 CALL open(0x480ff287,0,0x1b6)
110 rpcbind 1065963973.954998 NAMI "/etc/netconfig"
110 rpcbind 1065963973.955064 RET open 3
110 rpcbind 1065963973.955110 CALL __fstat13(0x3,0xbfbdcba0)
110 rpcbind 1065963973.955155 RET __fstat13 0
110 rpcbind 1065963973.955191 CALL read(0x3,0x8062000,0x2000)
110 rpcbind 1065963973.955240 GIO fd 3 read 774 bytes
"# $NetBSD: netconfig,v 1.1 2000/06/02 22:54:10 fvdl Exp $
#
# The network configuration file. This file is currently only used in
# conjunction with the (TI-) RPC code in the C library, unlike its
# use in SVR4.
#
# Entries consist of:
#
# <network_id> <semantics> <flags> <protofamily> <protoname> \
\134
# <device> <nametoaddr_libs>
#
# The <device> and <nametoaddr_libs> fields are always empty in NetBSD\
.
#
udp6 tpi_clts v inet6 udp - -
tcp6 tpi_cots_ord v inet6 tcp - -
udp tpi_clts v inet udp - -
tcp tpi_cots_ord v inet tcp - -
rawip tpi_raw - inet - - -
local tpi_cots_ord - loopback - - -
"
110 rpcbind 1065963973.955306 RET read 774/0x306
110 rpcbind 1065963973.955392 CALL close(0x3)
110 rpcbind 1065963973.955451 RET close 0
110 rpcbind 1065963973.955512 CALL __sysctl(0xbfbdccc8,0x6,0,0xbfbdccc4,0,0
)
110 rpcbind 1065963973.955569 RET __sysctl 0
110 rpcbind 1065963973.955607 CALL __sysctl(0xbfbdccc8,0x6,0x805bc00,0xbfbd
ccc4,0,0)
110 rpcbind 1065963973.955708 RET __sysctl 0
110 rpcbind 1065963973.955978 CALL gettimeofday(0xbfbdcd88,0)
110 rpcbind 1065963973.956034 RET gettimeofday 0
110 rpcbind 1065963973.956075 CALL gettimeofday(0xbfbdcd88,0)
110 rpcbind 1065963973.956115 RET gettimeofday 0
110 rpcbind 1065963973.956155 CALL poll(0xbfbfd4a0,0x4,0x7530)
110 rpcbind 1065963975.100258 RET poll 1
110 rpcbind 1065963975.100399 CALL recvfrom(0x6,0x805d000,0x2000,0,0xbfbfce
60,0xbfbfce5c)
110 rpcbind 1065963975.100478 GIO fd 6 read 104 bytes
"\0\0\0\0\0\0\0\0\0\0\0\^B\0\^A\M^F\240\0\0\0\^B\0\0\0\^E\0\0\0\^A\0\0\
\0$?\M^IQ\M-G\0\0\0
dess264-04\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\
\M^F\M-$\0\0\0\^B\0\0\0\^B\0\0\0\f\0\0\0\ateleinf\0"
I don't know why rpcbind ignores the request, I guess something
in the packet he did get. ethereal doens't show any difference
between a request which gets handed, and one which gets ignored.
I don't know well enouth the internals of RPC to be able to
decode the packet and see why it would fail.
>How-To-Repeat:
Not easy to reproduce on demand. rebooting a client will hang once
in about 20 reboots.
>Fix:
workaround: list servers in /var/yp/binding/<domain>.ypservers
>Release-Note:
>Audit-Trail:
>Unformatted: