Subject: Re: 3.0 YP lookup latency
To: None <firstname.lastname@example.org>
From: Christos Zoulas <email@example.com>
Date: 06/21/2006 12:52:49
On Jun 21, 9:42am, firstname.lastname@example.org (email@example.com) wrote:
-- Subject: Re: 3.0 YP lookup latency
| In message <20060621130038.CA56256534@rebar.astron.com>Christos Zoulas writes
| >On Jun 20, 2:48pm, firstname.lastname@example.org (Stephen Jones) wrote:
| >-- Subject: Re: 3.0 YP lookup latency
| >| Okay, this daily build (netbsd-3-0/200605220000Z) is displaying the
| >| symptoms.
| >| It doesn't appear to be in rpcbind or ypbind itself, but an
| >| associated library.
| >| It seems when the query is made the the ypserv process on the server
| >| will log each
| >| user id in the password file when the -l flag is on. Interestingly
| >| the userids are
| >| sorted in alphabetical order .. any reason for this? ;-)
| >My guess is that it is an artifact of the passwd file being hashed into
| >a db file. I guess I'll have to setup a yp domain myself and test.
| hi Christos,
| I don't buy your guess. As Stephen clarified (after the message to
| which you replied):
| On Stephen's NetBSD-2.1 NIS client hosts, the client is issuing a
| single yp_match() call to the server. But on Stephen's 3.0 clients,
| running the same userland tool (which, barring explicit size_t casts),
| hasn't changed between NetBSD-2 and NetBSD-3) the client iterates over
| Stephen's entire 27,000-entry NIS passwd.byname map, via yp_first()/yp_next().
| That's where the tens of seconds bites: not one individual RPC call
| but the 27,000-odd yp_next() calls.
| I'm pretty sure the bug is the client NIS library. If you look at
| you will notice that file changed radically between (CVS branches)
| netbsd-2 and netbsd-3. getpwent.c only calls yp_match() or
| yp_first()/yp_next() in a couple of places. Late last night I emailed
| you and Stephen and Soda-san a walk through the relevant code-paths. I
| think I've identified the problem, and suggested a workaround: don't
| use the supplied default nsswitch.conf
| passwd: compat
| line, but instead use
| passswd: nis [notfound=return] files
| which avoids the compat_ parsing routine where I'm pretty this bug
| resides. I also suggested a fix (assuming the workaround does fix
| Stephen's problem).
| In message Message-Id: <20060621130447.2B24C56534@rebar.astron.com>,
| Christos Zoulas continued:
| >Sure I would be happy to work with you to resolve this. The first thing
| >to do is to ktrace both the server and the client process and then do
| >a kdump -R to see between which 2 system calls we have the most delay.
| I understand why you ask (I initially asked for a libpcap trace).
| But, given Stephen's observation about his NIS-server logs I don't
| think either one would help. yp_match() and yp_first()/yp_next() are
| libc functions, not system calls. So a ktrace would show the
| reads()/write() calls for the 27,000-odd yp_next() calls which we know
| (from Stephen's server-side logs) the NetBSD-3.0 NIS client is issuing.
Yes, it should show the difference between the # of calls in 3.x and
the number of calls in 2.x. But I am convinced that the reason is what
you mentioned; it should not iterate through the whole map.