Subject: ypserv: discussion of libwrap vs securenets issues
To: None <tech-userlevel@netbsd.org>
From: Chuck Cranor <chuck@ece.cmu.edu>
List: tech-userlevel
Date: 11/01/2006 15:34:05
hi-

    in my spare time, i've been digging into some ypserv issues that
we've been seeing with 3.0.  what we have been seeing is ypserv running
out of file descriptors (too many open TCP connections), getting stuck 
in a tight loop, and no longer answering any queries.

    why does this happen to ypserv?   there are multiple reasons:

	1. we have a lot of linux clients, and linux makes alot more
	   use of TCP connections to ypserv than *BSD does.
	   [details: glibc-2.3.5/nis/nss_nis/*c uses the yp_all(3)
	    API in places where we do not (e.g. initgroups(3) )]

	2. there was a bug in src/lib/libc/rpc/svc_vc.c (since fixed
	   in rev 1.20) where running out of file descriptors would
	   cause an infinite loop.

	3. ypserv would sometimes handle requests too slowly, causing 
	   the backup that was letting the TCP-using linux clients run 
	   us out of file descriptors.  i believe that reason for this
	   slowdown is due to libwrap overhead.


before going into some detail about the libwrap overhead, it is 
important to understand that ypserv is a single threaded server 
application.   it processes one request at a time.   if ypserv 
blocks on a request (e.g. waiting for I/O), then everything stops
until it unblocks.

even on a system that does not use /etc/hosts.{allow,deny},
for each ypserv request it:
	- stats /etc/nsswitch.conf
	- reads /etc/hosts
	- performs a blocking reverse (in-addr.arpa) DNS request
	- stats /etc/nsswitch.conf again
	- reads /etc/hosts again
	- performs a blocking forward ("A") DNS request
	- tries to open /etc/hosts.allow and /etc/hosts.deny
before it gets to actually reading the YP *.db files to answer the
request.

in the above, the 2 blocking DNS queries are especially problematic,
since any sort of UDP glitch to named is going to stop yp service
for a number of seconds [e.g. think about the UDP timeout/retransmit
in res_send(3)].   while ypserv is blocked, TCP connections from linux
clients [doing initgroups(3)?] will build up.

[[ a possible additional problem here is that it is possible to 
   configure nsswitch.conf to use YP for hostname lookups.  i believe
   that makes it possible for the libwrap code in ypserv to issue a
   ypserv request to ypserv itself...   that could make an interesting 
   loop!   fortunately, the default 'hosts' line in nsswitch.conf 
   contains 'hosts: files dns' ]]


older versions of ypserv did not have this problem because they used
their own address-only access control file (/var/yp/securenets).  
since securenets is address-only, no blocking DNS queries are required.
NetBSD switch over from using /var/yp/securenets to libwrap in 1999.


what should we do about ypserv and libwrap?   

 - one option is to revert back to /var/yp/securenets.  that would avoid
   the blocking DNS requests and provide more backward compat (most ypservs
   support securenets).   but our old securenets code does not support 
   IPv6 addrs (libwrap does, I think) and we already support libwrap so a 
   reversion could be confusing.

 - another option would be to modify libwrap so that it had a mode where
   it did not do DNS or YP requests in the critical path.   currently
   libwrap does "unsafe for ypserv" operations like:
      1. hostname lookups [e.g. in sock_hostname()]
      2. YP netgroup lookups [e.g. innetgr(3) calls in hosts_match() ]
      3. YP protocol lookups [e.g. getprotobyname(3) in fix_options.c ]
      4. libwrap "options" like "spawn" and "rfc931" (blocking ident
         request in the critical path) could be bad too.

   if we could turn those off and declare that ypserv would only take
   address-based hosts_access specs in /etc/hosts.{allow,deny}, then
   we could continue to use libwrap (but we would not be able to specify
   hostnames for rules that apply to ypserv).


how should we address this problem?


chuck