tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: db(3) removal and lastlogx

On Sun, Jul 22, 2012 at 08:27:27AM +0200, Alistair Crooks wrote:
> On Sat, Jul 21, 2012 at 11:28:38PM +0000, Christos Zoulas wrote:
> > In article <>,
> > Alistair Crooks  <> wrote:
> > >
> > >Well, in lieu of any supporting arguments for the migration of db to cdb
> > >format, let's revert them all.
> > 
> > Aside the compatibility issues (which I believe are mostly fine), the cdb
> > changes for the read-only databases is a strict improvement.
> Creates or reads, or both?

Both. I think we talked enough about creation already. The read path in
in cdbr is a slightly fancy memory mapped hash table. No locking
involved and all processes directly use the kernel file cache. db185
doesn't use mmap and has its own set of userland buffers. I imagine the
resulting thread-safety issues are one of the reason for the "giant
lock" in the nsdispatch layer, to name only one popular place.

> Recovery was another issue which was flagged under db1 as being
> problematic - how is it done for cdb?  How is it superior?  How
> does this relate to a readonly db? For readwrite dbs like the passwd
> ones?
> Transactions - how do they relate to a readonly database? I've seen
> that used as justification too.

Just to reduce the general amount of confusion. Both points (recovery
and transaction handling) are about db185 being outdated in general, not
in the context of read-only databases. Not even passwd is really a
read-write database, it is still copied first at the very least.
The only consideration in the context of read-only databases is whether
to fsync() at the end of creation or not.

> And the compat issues - since we'll have to keep the db1 code in libc
> - they're kinda difficult, especially if we have any statically-linked
> programs which use termcap/terminfo, or the user databases, or
> services, etc.

No, I don't think that requires keeping the db1 code in libc. The code
for writing compat versions of the libraries is in the various helper
programs. So while we can't remove db1 complete without breaking
compatibility in some form, it can certainly be moved out of libc and
made optional.

That leaves the last point, why not providing and using a db185
compatible interface. The first concern is that the conversion process
makes a really good time to verify that the file content is actually up
to the NetBSD standard, e.g. platform neutral. There were a lot of
issues in that area and pwd.db is still open. The second concern is
about the db185 database API. Emulating it requires storing the keys
explicitly in the database. In most cases, this duplicates variable
length data. While the disk space might be cheap, less dense files are
also less likely to be cached as well. Access to specific record numbers
seems to be only providable by (ab)using the key argument for the seq
access. The concept is used quite a bit since it avoids storing another
set of keys if aliases are desired. Life cycle is another interesting
issue. The normal db185 API makes no guarantees about the result buffer
after the next call of get/put/whatever.  This means at the very least
that the existing code is not able to exploit such useful properties.

Compared code using cdbr_open / cdbr_find / cdbr_get with the
corresponding code using dbopen / get, the cdbr code has the additional
burden of validating that the returned data is the desired entry. In
many cases the key was already part of the actual data, so this mostly
involves an additional compare after demarshalling to figure out if an
entry was found or not. This is normally compensated by the much more
verbose code needed for get() and the often needed local buffer for the
data copy in the case of get(). 

There are three db185 users remaining in libc. cgetcap and friends is
primarily left as I am still trying to wrap my mind around the code and
not liking the resulting topology. I also have to check how the
interface is commonly used to decide if it makes sense to provided
indexed access to individual attributes or not. passwd(5) is the second
remaining constant database. I'm still trying to decide if I want to
tackle the conversion first or fix the nsdispatch code base to actually
be modular. The various hacks for NIS in that code is also adding some
complication. If someone wants to write an ATF testbed for NIS, it would
certainly be appreciated and useful for either task. This leaves the
lastlog database that started this thread. I don't have the time to
write a proof-of-concept program that tries to slow down login(1) enough
that SIGSTOP can interrupt it while holding the lastlog(x) lock to
prevent other users from logging in. I think it is possible, at least on
SMP systems. I haven't had time to fully sit down and implement one of
the semi-sparse atomic update schemes yet, but I feel confident that it
can be done. E.g. the database will be somewhat sparse using 64K chunks
of the UID space, but only chunks in use will be in the file. As long as
the UID distribution is dense, one or two chunks should be present in
the file with the associated data, for two chunks the seeming file size
would be somewhere in the area of 64MB.


Home | Main Index | Thread Index | Old Index