tech-userlevel: Re: Text form of wtmp

Subject: Re: Text form of wtmp
To: Valentin Nechayev <netch@lucky.net>
From: Greywolf <greywolf@starwolf.com>
List: tech-userlevel
Date: 02/16/2005 14:37:57
First of all, to all the smartalecs out there who don't understand why
wtmp is the way it is (ditto wtmpx), if all you can think of to do is offer
witty repartee, compare /etc/passwd and /etc/group to wtmp[x] (in fact
the former is already db-indexed for performance reasons), I suggest you
take a look at the way the system is laid out.  It hasn't been like this
for 30 years without change for no good reason.  I'm sure that, had it
been deemed important to have it change, this particular paradigm
would have changed back sometime around the transition between 4.2 BSD
and 4.3 BSD.

Secondly, when I speak "efficiency", I am still attempting to think of
all the obsolete hardware that we still support.

Nextly, if you want to argue a valid comparison, you might point, instead,
to the "new" directory structure of the original FFS vs. the 14-char limit
imposed by the old SysV filesystem.  Yes, things became more variable
there, but a lot of mangling had to be done.  Instead of all directory
entries fitting conveniently into 16 bytes apiece (14 chars + a 2-bit
unsigned inode number), the filesystem effectively had to RLE the entry
-- hence d_ino, d_name, d_namelen, d_next and, eventually, getdents().

[Thus spake Valentin Nechayev ("VN: ") 2:12pm...]

VN: It's easier for too many tasks to use binary formats, protocols and
VN: fixed-size records, but it's not Unix world style and not Internet world
VN: style.

Then I dare say the Unix World and the Internet World have become far
too spoiled and lazy, expecting everything to be textformatted and handed
to them.  It is a greater pain to take massaged text output and
convert it into something that is more easily maintained and accessed
than it is to go the other way, in case you haven't figured that one
out by now...

VN: > We would not gain anything by converting wtmp from fixed-size-record
VN: > format to text format.  Humans are not the only things who gain from
VN: > reading the information inside wtmp.
VN:
VN: Well, what about changing of /etc/group to binary format?;)) It will be much
VN: easier to parse, and this is really needed (initgroups() can be called
VN: hundreds per second), unlike quite rare reading of wtmp...

Show me a profile where initgroups() is a bottleneck, then submit a PR --
having a group.byuser.db and group.bygroup.db aside /etc/group might
prove to be a win in that case.  wtmpx* and potentially acct are both
written to about twice as often as initgroups() is called.

VN: I can imagine arguments to keep status quo - binary wtmp - due to POLA
VN: principle. But please don't argue to binary logs in Unix world, it has no
VN: sense :)

You may have a point there, but only if you are talking about logs which
are not used/decoded by many different processes for many different purposes.

When you're dealing with system accounting procedures, be it lastlog,
wtmpx*, acct*, or whatever, the kernel and/or init are handling them.
From a logistical standpoint, it's quite a bit simpler and, yes, more
efficient, for these mechanisms to be handled in a fixed-width record
sort of manner.

kprintf() has long been eschewed as being inefficient, and, unless some-
thing has changed recently, I understand that the standard advice given
regarding kprintf is "if you don't need to, don't."  If you've ever
looked at the guts of *printf, you should find it pretty obvious as to its
inefficiencies.

On the other hand, the kernel/init have the requisite information already
ready to just dump out in record format, because (surprise, surprise) the
records being used are already in the size and layout used by the userland
tools (obviously, the userland tools were written to deal with the output
from the kernel! :).

So it's pre-broken, if you want to insist on your point of view.
From a programming point of view, though, massaging text into something
usable, storable and easily decodable and massageable into some other
format is, potentially, a ROYAL P.I.T.A. -- it's a lot easier to
decode binary data into a plethora of desired outputs than to get it
back into that format in the first place.  This may OR MAY NOT have
much to do with this on a small scale, but when you look at a larger
view, it's significant.

Also, by your argument that "it is not the Unix world way or the Internet
world way", it could just as easily be extended back into the filesystem,
even if you're just referring at the moment to "logs".

For example, I will point you to the output of "ls -lis", especially on
systems that do NOT put spaces anywhere between the formatting characters
in the format string (i.e. "%10s%8d%8d%9s");

	* converting 'drwxr-xr-t' into 041755 involves some work.  The
	  '04' is easy enough, but then converting the rest is a pain.
	* if you have a link count or a filesize that exceeds the width
	  of the field specified in the formatting string, you have
	  fields which are run together in an unparsable manner.
	  The game is now over.

And finally, to address perhaps the most simple characteristic regarding
wtmp, wtmpx, acct, lastlog and lastlogx:  It isn't broken by any stretch
of the imagination (save that wtmp records became too small with the
advent of the commercial internet!).  It doesn't need to be fixed.

				--*greywolf;
--
Mankind should spend much more effort on education, escalating their levels of
intellect to those currently enjoyed by the upper minority, rather than forcing
the higher ones to lower themselves to the level of the average moron.  Until
this happens, we will not truly advance our species, we'll only perpetuate it.