Subject: Re: pwd_mkdb - optimisation.
To: NetBSD User's Discussion List <netbsd-users@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: current-users
Date: 08/11/2001 00:32:43
[ On Friday, August 10, 2001 at 17:16:51 (-0500), Stephen M Jones wrote: ]
> Subject: Re: pwd_mkdb - optimisation.
>
> That is true .. when I was first exposed to bsd's passwd database ideas
> years ago, I understood why .. there are trade offs.

Yes, indeed there are many trade-offs in handling data that's both
very frequently accessed and frequently updated!  ;-)

Normally one might assume that changes to account information would
happen relatively infrequently and thus adding a simply maintained hash
table to speed lookups in large passwd files.

Certainly in some types of systems, such as some I help manage where the
changes to /etc/master.passwd et al come primarily from a central
database system on cron-driven schedules these trade-offs are very easy
to manage.

However I certainly agree that in more dynamic stand-alone systems where
only the native tools are used, those native tools are somewhat limited
in their capabilities (even while they strive to reduce the scaling
issues for the still more frequent uses of the data in question).

>  Its unfortunate 
> that user accounts are stored in 'too many' ways as they are with a slew
> of temporary files (pw.*a, ptmp, ptmp.orig, pwd.db.tmp, spwd.db.tmp) ..
> its a 'unix' way, but its also extremely krufty ..

I'm not so sure there's anything wrong on this front.  It's good to have
this so frequently used data available in several convenient formats.
The flat files are still extremely useful for sequential processing, and
even some native system tools use this data in that way.

Though the current .db files appear to contain all the data fields,
there may be several arguments for creating them as pure indexes too,
with just offset pointers into the flat files (along with perhaps record
length pointers too) being stored along with the key values.

> I was hoping by 
> dumping fuel on an old thread, that we could all work together to 
> resolve this as its a big issue for large user base systems.  My kludge
> could work for many and could be acceptable.. its what I'll use for
> the interim.  I would love to have a final hack, I'm sorry that I just
> can't provide one :(

I certainly have no argument against any of the proposals to make the
*.db files incrementally updatable.  I suspect this one apparently
relatively simple change will raise the bar on the scaling issue high
enough that it becomes a non-issue again.

Obviously adjusting the hash parameters for optimum performance on a
given sized system is also always a good idea too.  Making these
parameters run-time tunable, and perhaps even coming up with algorithms
for automatically tuing them, is probably good too.

I haven't recently thought hard enough about this issue to have any
opinion on whether converting to using some other indexing scheme
(eg. btree) is a good idea or not.

I doubt any real magic need be applied to improving the performance of
the flat-file updates (beyond say updating fixed-length fields in
place).  Even with the worst case maximum record length a system capable
of handling 100k users demanding quick updates to passwd(5) info should
be able to copy-edit the master.passwd file in less than one second on
average (i.e. from/to stable secondary disk storage).

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>     <woods@robohack.ca>
Planix, Inc. <woods@planix.com>;   Secrets of the Weird <woods@weird.com>