Subject: lib/9413: resolver weirdnesses
To: None <gnats-bugs@gnats.netbsd.org>
From: None <Thilo.Manske@HEH.Uni-Oldenburg.DE>
List: netbsd-bugs
Date: 02/13/2000 15:03:46
>Number:         9413
>Category:       lib
>Synopsis:       strange problems with NetBSD's resolver
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people (Library Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 13 15:03:00 2000
>Last-Modified:
>Originator:     Thilo Manske
>Organization:
Dies ist Thilos Unix Signature! Viel Spass damit.
>Release:        4th and 12th February
>Environment:
	
System: NetBSD WintelKiller 1.4S NetBSD 1.4S (WintelKiller) #194: Sat Feb 12 15:16:01 MET 2000 thilo@WintelKiller:/usr/src/sys/arch/i386/compile/WintelKiller i386

No IPv6 enabled at all.

>Description:
I've discovered strange actions by NetBSD's resolver:

When I issue rlogin seti-100 NetBSD does this:

WintelKiller-100.65372 > seti-100.domain:  10782+ AAAA? seti-100.HEH.Uni-Oldenburg.DE. (47)
seti-100.domain > WintelKiller-100.65372:  10782 NXDomain* 0/1/0 (115)
WintelKiller-100.65371 > seti-100.domain:  10783+ AAAA? seti-100. (26)
seti-100.domain > WintelKiller-100.65371:  10783 NXDomain 0/1/0 (99)

HEH.Uni-Oldenburg.de is my defaultdomain, set in /etc/resolv.conf (search
option))

What's strange is, that
A) seti-100 is in /etc/hosts (and nsswitch.conf set properly)
B) I don't have IPv6 enabled, so I'm not intereseted in "AAAA" records
C) Some tools do this (rlogin, telnet, ftp, rsh, tcpdump...) some not
(i.e. work correctly; ping and ntpdate e.g. - and host of course (doesn't
use NetBSD'S resolver, right?))

A dummy program using gethostbyname(3) works correctly as well.

When our site and thus my cacheing nameserver (seti-100) has lost its
connection to the rest of the 'net this saturday name resolving needed
ages.  (And that was the reason I've started investigating this :-)

Here's a lookup of an nonexistent host (not present in /etc/host):

WintelKiller-100.65376 > seti-100.domain:  19293+ AAAA? doesnotexist.HEH.Uni-Oldenburg.DE. (51)
seti-100.domain > WintelKiller-100.65376:  19293 NXDomain* 0/1/0 (119)
WintelKiller-100.65375 > seti-100.domain:  19294+ AAAA? doesnotexist. (30)
seti-100.domain > WintelKiller-100.65375:  19294 NXDomain 0/1/0 (103)
WintelKiller-100.65374 > seti-100.domain:  19295+ A? doesnotexist.HEH.Uni-Oldenburg.DE. (51)
seti-100.domain > WintelKiller-100.65374:  19295 NXDomain* 0/1/0 (119)
WintelKiller-100.65373 > seti-100.domain:  19296+ A? doesnotexist. (30)
seti-100.domain > WintelKiller-100.65373:  19296 NXDomain 0/1/0 (103)

Looks like the resolver does this in some cases:
1. IPv6 adress requests (AAAA),
2. following the rules in /etc/nsswitch.conf

I've browsed the resolver sources in src/lib/libc/net quickly to find
a maybe undocumented option of nsswitch.conf but didn't succeed.

I've discovered the problem with world build around 4th February but it
still exists with yesterday's sources.

Here are parts of a ktrace "rsh seti-100 echo":
(I hope the interesting ones):

  1190 rsh      NAMI  "/etc/nsswitch.conf"
  1190 rsh      RET   __stat13 0
  1190 rsh      CALL  open(0x480cf882,0,0x1b6)
  1190 rsh      NAMI  "/etc/hosts"
  1190 rsh      RET   open 3
  1190 rsh      CALL  __fstat13(0x3,0xefbfd53c)
  1190 rsh      RET   __fstat13 0
  1190 rsh      CALL  read(0x3,0x8050000,0x2000)
  1190 rsh      GIO   fd 3 read 64 bytes
       "127.0.0.1       localhost
        10.2.0.1        seti-100
        10.2.0.2        WintelKiller-100
       "
  1190 rsh      RET   read 64/0x40
  1190 rsh      CALL  read(0x3,0x8050000,0x2000)
  1190 rsh      GIO   fd 3 read 0 bytes
       ""
  1190 rsh      RET   read 0
  1190 rsh      CALL  close(0x3)
  1190 rsh      RET   close 0
  1190 rsh      CALL  madvise(0x8050000,0x2000,0x6)
  1190 rsh      RET   madvise 0
  1190 rsh      CALL  gettimeofday(0xefbfcd98,0)
  1190 rsh      RET   gettimeofday 0
  1190 rsh      CALL  getpid
  1190 rsh      RET   getpid 1190/0x4a6
  1190 rsh      CALL  open(0x480d1237,0,0x1b6)
  1190 rsh      NAMI  "/etc/resolv.conf"
  1190 rsh      RET   open 3
  1190 rsh      CALL  __fstat13(0x3,0xefbfccd8)
  1190 rsh      RET   __fstat13 0
  1190 rsh      CALL  read(0x3,0x8050000,0x2000)
  1190 rsh      GIO   fd 3 read 48 bytes
       "search HEH.Uni-Oldenburg.DE
        nameserver 10.2.0.1
       "
  1190 rsh      RET   read 48/0x30
  1190 rsh      CALL  read(0x3,0x8050000,0x2000)
  1190 rsh      GIO   fd 3 read 0 bytes
       ""
  1190 rsh      RET   read 0
  1190 rsh      CALL  close(0x3)
  1190 rsh      RET   close 0
  1190 rsh      CALL  madvise(0x8050000,0x2000,0x6)
  1190 rsh      RET   madvise 0
  1190 rsh      CALL  socket(0x2,0x2,0)
  1190 rsh      RET   socket 3
  1190 rsh      CALL  connect(0x3,0x480e18a0,0x10)
  1190 rsh      RET   connect 0
  1190 rsh      CALL  sendto(0x3,0xefbfc9c0,0x2f,0,0,0)
  1190 rsh      GIO   fd 3 wrote 47 bytes
       "gL\^A\0\0\^A\0\0\0\0\0\0\bseti-100\^CHEH\rUni-Oldenburg\^BDE\0\0\^\\0\
        \^A"
  1190 rsh      RET   sendto 47/0x2f
  1190 rsh      CALL  poll(0xefbfc778,0x1,0x1388)
  1190 rsh      RET   poll 1
  1190 rsh      CALL  recvfrom(0x3,0xefbfd25c,0x400,0,0xefbfc78c,0xefbfc774)
  1190 rsh      GIO   fd 3 read 115 bytes
       "gL\M^E\M^C\0\^A\0\0\0\^A\0\0\bseti-100\^CHEH\rUni-Oldenburg\^BDE\0\0\
        \^\\0\^A\^CHEH\rUni-Oldenburg\^BDE\0\0\^F\0\^A\0\^AQ\M^@\0$\^FServer\
        \M-@/\^Droot\M-@Ow5\M-b\M-U\0\08@\0\0\^N\^P\0   :\M^@\0\^AQ\M^@"
  1190 rsh      RET   recvfrom 115/0x73
  1190 rsh      CALL  close(0x3)
  1190 rsh      RET   close 0
  1190 rsh      CALL  socket(0x2,0x2,0)
  1190 rsh      RET   socket 3
  1190 rsh      CALL  connect(0x3,0x480e18a0,0x10)
  1190 rsh      RET   connect 0
  1190 rsh      CALL  sendto(0x3,0xefbfc9c0,0x1a,0,0,0)
  1190 rsh      GIO   fd 3 wrote 26 bytes
       "gM\^A\0\0\^A\0\0\0\0\0\0\bseti-100\0\0\^\\0\^A"
  1190 rsh      RET   sendto 26/0x1a
  1190 rsh      CALL  poll(0xefbfc778,0x1,0x1388)
  1190 rsh      RET   poll 1
  1190 rsh      CALL  recvfrom(0x3,0xefbfd25c,0x400,0,0xefbfc78c,0xefbfc774)
  1190 rsh      GIO   fd 3 read 99 bytes
       "gM\M^A\M^C\0\^A\0\0\0\^A\0\0\bseti-100\0\0\^\\0\^A\0\0\^F\0\^A\0\0"N\0\
        >\^AA\fROOT-SERVERS\^CNET\0
        hostmaster\binternic\M-@4w5\M-g4\0\0\a\b\0\0\^C\M^D\0   :\M^@\0\^AQ\
        \M^@"
  1190 rsh      RET   recvfrom 99/0x63
  1190 rsh      CALL  close(0x3)
  1190 rsh      RET   close 0

I hope this is enough information :-)

>How-To-Repeat:
"tcpdump -vv -i <interface> port domain" in one shell
and "rlogin <somehost>" etc in another one.

rlogin host
>Fix:
>Audit-Trail:
>Unformatted: