Subject: Re: 3.0 YP lookup latency
To: Christos Zoulas <christos@zoulas.com>
From: None <jonathan@dsg.stanford.edu>
List: tech-net
Date: 06/21/2006 09:42:15
In message <20060621130038.CA56256534@rebar.astron.com>Christos Zoulas writes
>On Jun 20,  2:48pm, smj@cirr.com (Stephen Jones) wrote:
>-- Subject: Re: 3.0 YP lookup latency
>
>| Okay, this daily build (netbsd-3-0/200605220000Z) is displaying the  
>| symptoms.
>| It doesn't appear to be in rpcbind or ypbind itself, but an  
>| associated library.
>| It seems when the query is made the the ypserv process on the server  
>| will log each
>| user id in the password file when the -l flag is on.  Interestingly  
>| the userids are
>| sorted in alphabetical order .. any reason for this? ;-)
>
>My guess is that it is an artifact of the passwd file being hashed into
>a db file. I guess I'll have to setup a yp domain myself and test.

hi Christos,

I don't buy your guess. As Stephen clarified (after the message to
which you replied):

On Stephen's NetBSD-2.1 NIS client hosts, the client is issuing a
single yp_match() call to the server.  But on Stephen's 3.0 clients,
running the same userland tool (which, barring explicit size_t casts),
hasn't changed between NetBSD-2 and NetBSD-3) the client iterates over
Stephen's entire 27,000-entry NIS passwd.byname map, via yp_first()/yp_next().
That's where the tens of seconds bites: not one individual RPC call
but the 27,000-odd yp_next() calls.

I'm pretty sure the bug is  the client NIS library. If you look at
	lib/libc/gen/getpwent.c

you will notice that file changed radically between (CVS branches)
netbsd-2 and netbsd-3.  getpwent.c only calls yp_match() or
yp_first()/yp_next() in a couple of places.  Late last night I emailed
you and Stephen and Soda-san a walk through the relevant code-paths. I
think I've identified the problem, and suggested a workaround: don't
use the supplied default nsswitch.conf

	passwd:  compat

line, but instead use

	passswd: nis [notfound=return] files

which avoids the compat_ parsing routine where I'm pretty this bug
resides.  I also suggested a fix (assuming the workaround does fix
Stephen's problem).


In message Message-Id: <20060621130447.2B24C56534@rebar.astron.com>,
Christos Zoulas continued:

>Sure I would be happy to work with you to resolve this. The first thing
>to do is to ktrace both the server and the client process and then do
>a kdump -R to see between which 2 system calls we have the most delay.

I understand why you ask (I initially asked for a libpcap trace).
But, given Stephen's observation about his NIS-server logs I don't
think either one would help.  yp_match() and yp_first()/yp_next() are
libc functions, not system calls. So a ktrace would show the
reads()/write() calls for the 27,000-odd yp_next() calls which we know
(from Stephen's server-side logs) the NetBSD-3.0 NIS client is issuing.