Subject: lib/9413: resolver weirdnesses
To: None <gnats-bugs@gnats.netbsd.org>
From: None <Thilo.Manske@HEH.Uni-Oldenburg.DE>
List: netbsd-bugs
Date: 02/13/2000 15:03:46
>Number: 9413
>Category: lib
>Synopsis: strange problems with NetBSD's resolver
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people (Library Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Feb 13 15:03:00 2000
>Last-Modified:
>Originator: Thilo Manske
>Organization:
Dies ist Thilos Unix Signature! Viel Spass damit.
>Release: 4th and 12th February
>Environment:
System: NetBSD WintelKiller 1.4S NetBSD 1.4S (WintelKiller) #194: Sat Feb 12 15:16:01 MET 2000 thilo@WintelKiller:/usr/src/sys/arch/i386/compile/WintelKiller i386
No IPv6 enabled at all.
>Description:
I've discovered strange actions by NetBSD's resolver:
When I issue rlogin seti-100 NetBSD does this:
WintelKiller-100.65372 > seti-100.domain: 10782+ AAAA? seti-100.HEH.Uni-Oldenburg.DE. (47)
seti-100.domain > WintelKiller-100.65372: 10782 NXDomain* 0/1/0 (115)
WintelKiller-100.65371 > seti-100.domain: 10783+ AAAA? seti-100. (26)
seti-100.domain > WintelKiller-100.65371: 10783 NXDomain 0/1/0 (99)
HEH.Uni-Oldenburg.de is my defaultdomain, set in /etc/resolv.conf (search
option))
What's strange is, that
A) seti-100 is in /etc/hosts (and nsswitch.conf set properly)
B) I don't have IPv6 enabled, so I'm not intereseted in "AAAA" records
C) Some tools do this (rlogin, telnet, ftp, rsh, tcpdump...) some not
(i.e. work correctly; ping and ntpdate e.g. - and host of course (doesn't
use NetBSD'S resolver, right?))
A dummy program using gethostbyname(3) works correctly as well.
When our site and thus my cacheing nameserver (seti-100) has lost its
connection to the rest of the 'net this saturday name resolving needed
ages. (And that was the reason I've started investigating this :-)
Here's a lookup of an nonexistent host (not present in /etc/host):
WintelKiller-100.65376 > seti-100.domain: 19293+ AAAA? doesnotexist.HEH.Uni-Oldenburg.DE. (51)
seti-100.domain > WintelKiller-100.65376: 19293 NXDomain* 0/1/0 (119)
WintelKiller-100.65375 > seti-100.domain: 19294+ AAAA? doesnotexist. (30)
seti-100.domain > WintelKiller-100.65375: 19294 NXDomain 0/1/0 (103)
WintelKiller-100.65374 > seti-100.domain: 19295+ A? doesnotexist.HEH.Uni-Oldenburg.DE. (51)
seti-100.domain > WintelKiller-100.65374: 19295 NXDomain* 0/1/0 (119)
WintelKiller-100.65373 > seti-100.domain: 19296+ A? doesnotexist. (30)
seti-100.domain > WintelKiller-100.65373: 19296 NXDomain 0/1/0 (103)
Looks like the resolver does this in some cases:
1. IPv6 adress requests (AAAA),
2. following the rules in /etc/nsswitch.conf
I've browsed the resolver sources in src/lib/libc/net quickly to find
a maybe undocumented option of nsswitch.conf but didn't succeed.
I've discovered the problem with world build around 4th February but it
still exists with yesterday's sources.
Here are parts of a ktrace "rsh seti-100 echo":
(I hope the interesting ones):
1190 rsh NAMI "/etc/nsswitch.conf"
1190 rsh RET __stat13 0
1190 rsh CALL open(0x480cf882,0,0x1b6)
1190 rsh NAMI "/etc/hosts"
1190 rsh RET open 3
1190 rsh CALL __fstat13(0x3,0xefbfd53c)
1190 rsh RET __fstat13 0
1190 rsh CALL read(0x3,0x8050000,0x2000)
1190 rsh GIO fd 3 read 64 bytes
"127.0.0.1 localhost
10.2.0.1 seti-100
10.2.0.2 WintelKiller-100
"
1190 rsh RET read 64/0x40
1190 rsh CALL read(0x3,0x8050000,0x2000)
1190 rsh GIO fd 3 read 0 bytes
""
1190 rsh RET read 0
1190 rsh CALL close(0x3)
1190 rsh RET close 0
1190 rsh CALL madvise(0x8050000,0x2000,0x6)
1190 rsh RET madvise 0
1190 rsh CALL gettimeofday(0xefbfcd98,0)
1190 rsh RET gettimeofday 0
1190 rsh CALL getpid
1190 rsh RET getpid 1190/0x4a6
1190 rsh CALL open(0x480d1237,0,0x1b6)
1190 rsh NAMI "/etc/resolv.conf"
1190 rsh RET open 3
1190 rsh CALL __fstat13(0x3,0xefbfccd8)
1190 rsh RET __fstat13 0
1190 rsh CALL read(0x3,0x8050000,0x2000)
1190 rsh GIO fd 3 read 48 bytes
"search HEH.Uni-Oldenburg.DE
nameserver 10.2.0.1
"
1190 rsh RET read 48/0x30
1190 rsh CALL read(0x3,0x8050000,0x2000)
1190 rsh GIO fd 3 read 0 bytes
""
1190 rsh RET read 0
1190 rsh CALL close(0x3)
1190 rsh RET close 0
1190 rsh CALL madvise(0x8050000,0x2000,0x6)
1190 rsh RET madvise 0
1190 rsh CALL socket(0x2,0x2,0)
1190 rsh RET socket 3
1190 rsh CALL connect(0x3,0x480e18a0,0x10)
1190 rsh RET connect 0
1190 rsh CALL sendto(0x3,0xefbfc9c0,0x2f,0,0,0)
1190 rsh GIO fd 3 wrote 47 bytes
"gL\^A\0\0\^A\0\0\0\0\0\0\bseti-100\^CHEH\rUni-Oldenburg\^BDE\0\0\^\\0\
\^A"
1190 rsh RET sendto 47/0x2f
1190 rsh CALL poll(0xefbfc778,0x1,0x1388)
1190 rsh RET poll 1
1190 rsh CALL recvfrom(0x3,0xefbfd25c,0x400,0,0xefbfc78c,0xefbfc774)
1190 rsh GIO fd 3 read 115 bytes
"gL\M^E\M^C\0\^A\0\0\0\^A\0\0\bseti-100\^CHEH\rUni-Oldenburg\^BDE\0\0\
\^\\0\^A\^CHEH\rUni-Oldenburg\^BDE\0\0\^F\0\^A\0\^AQ\M^@\0$\^FServer\
\M-@/\^Droot\M-@Ow5\M-b\M-U\0\08@\0\0\^N\^P\0 :\M^@\0\^AQ\M^@"
1190 rsh RET recvfrom 115/0x73
1190 rsh CALL close(0x3)
1190 rsh RET close 0
1190 rsh CALL socket(0x2,0x2,0)
1190 rsh RET socket 3
1190 rsh CALL connect(0x3,0x480e18a0,0x10)
1190 rsh RET connect 0
1190 rsh CALL sendto(0x3,0xefbfc9c0,0x1a,0,0,0)
1190 rsh GIO fd 3 wrote 26 bytes
"gM\^A\0\0\^A\0\0\0\0\0\0\bseti-100\0\0\^\\0\^A"
1190 rsh RET sendto 26/0x1a
1190 rsh CALL poll(0xefbfc778,0x1,0x1388)
1190 rsh RET poll 1
1190 rsh CALL recvfrom(0x3,0xefbfd25c,0x400,0,0xefbfc78c,0xefbfc774)
1190 rsh GIO fd 3 read 99 bytes
"gM\M^A\M^C\0\^A\0\0\0\^A\0\0\bseti-100\0\0\^\\0\^A\0\0\^F\0\^A\0\0"N\0\
>\^AA\fROOT-SERVERS\^CNET\0
hostmaster\binternic\M-@4w5\M-g4\0\0\a\b\0\0\^C\M^D\0 :\M^@\0\^AQ\
\M^@"
1190 rsh RET recvfrom 99/0x63
1190 rsh CALL close(0x3)
1190 rsh RET close 0
I hope this is enough information :-)
>How-To-Repeat:
"tcpdump -vv -i <interface> port domain" in one shell
and "rlogin <somehost>" etc in another one.
rlogin host
>Fix:
>Audit-Trail:
>Unformatted: