Subject: Re: newsyslog and script execution instead of sending signal to process
To: Aaron J. Grier <agrier@poofygoof.com>
From: Greg A. Woods <woods@planix.com>
List: current-users
Date: 07/18/2007 14:33:07
--pgp-sign-Multipart_Wed_Jul_18_14:33:05_2007-1
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

At Tue, 17 Jul 2007 18:52:16 -0700, Aaron J. Grier wrote:
Subject: Re: newsyslog and script execution instead of sending signal to pr=
ocess
>=20
> On Tue, Jul 17, 2007 at 08:15:12PM -0400, Greg A. Woods wrote:
> > In fact my intention is to implement the move as a rename(2) and to barf
> > an error if it fails.  I'm sure as heck not going to call mv(1), nor am
> > I going to implement an internal file copy.
>=20
> is rename(2) an atomic operation?

"man rename"?  indeed it is (or at least it damn well better be!) --
that's its entire reason for being, i.e. as opposed to doing a link(2)
and then an unlink(2) like we had to do in the good old days before
someone had the insight to invent and implement rename(2).


> > > how does the processing program avoid polling the logfile depot?
> >=20
> > If it were me it wouldn't poll -- it would be invoked once a minute by
> > cron and it would exit cleanly, quickly and silently if there were no
> > files ready to process.
>=20
> how is that not a form of polling?

If the script runs, it might do something but then it quits -- if the
script doesn't run then it won`t do anything.  The script is not
polling.  Polling a directory directly from a script is difficult to do
and usually very inefficient in one way or another.

In the case where the script is invoked repeatedly by cron then cron
might considered to be polling, but then that's kinda under its mandate.  :=
-)
(on the other hand cron could also be considered to be event driven and
the "event" is a timer ticking over, and the script run off the timer
event always runs and always exits when it is done)

I.e. you asked how the script avoids polling, and indeed the suggestion
I offered is a design which is intended to run once and quit, not to try
to stay running and do some form of polling.

Of course if it were a program with more direct access to the full range
of available system calls rather than just a shell script then it could
also avoid polling by having the kernel notify it when there's work to
do and otherwise just sit there sleeping.  Even the kernel isn't doing
the "polling" in that situation -- it's entirely event driven.

Note further that any script invoked to do work on a file will have to
be careful to be sure it's the only invocation working on the file, and
be sure that nothing else is working on the file, etc., etc., etc.,
regardless of whether it is invoked from cron or from the log management
tool which is not newsyslog.  I.e. the complexities of safely looking to
see if there really is work to do, and safely acquiring the work while
making sure no other instance is doing the same, etc., etc., etc., must
be implemented by any and every log processing program regardless of
whether one is using some insecure feature-bloated log management tool
which can invoke the script, or whether one just invokes the script
periodically, and safely, with cron.

Given various logical constraints it's quite easy to implement a safe
log processing script if it is based on a design where it can look for
any file in a directory matching a given naming pattern, rename that
file once again to a unique private name (perhaps in yet another working
directory), and then, and only then, open it and begin work on it.  So,
once you've implemented such a program it's literally irrelevant what
its parent process is called.  Running it from cron once per minute,
perhaps under a special identity which has appropriate access to the
directories to be worked in, is the most secure way of doing things.
Note also that it doesn't depend on any particular program being the one
that kicks out the logs to be processed either.  I.e. the design is also
completely agnostic to the log generation and roll-over tools, all
except perhaps for the fact that for secure operation under an identity
separate from that of the log creator, it must interface through an
intermediary directory.

One other interesting point here too:  if you don't needed additional
privilege separation then it's easy enough to do log processing of
syslog logs through periodic invocation of a script run as root from
cron without any new features in any newsyslog.  Just design the script
to work on any file matching the pattern "$logname".[0-9] and then even
if there's a burst of activity which causes newsyslog to create .0 and
.1 and .2 logs and so on, an instance of the log processor script can
still just come along and safely grab the next available one (an then
even continue to work on it in parallel with the next instance).

--=20
						Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>       Secrets of the Weird <woods@weird.com>

--pgp-sign-Multipart_Wed_Jul_18_14:33:05_2007-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: 81yWHnxKUs9iiF9z8S3ZjyXBhgi5GA34

iQA/AwUBRp5c4mZ9cbd4v/R/EQJdvACeMLKoKmrVmVNuZL0oj8ryayGOK18AoJyR
N49VfRqzpJMrEFls2plNeGR+
=8ftf
-----END PGP SIGNATURE-----

--pgp-sign-Multipart_Wed_Jul_18_14:33:05_2007-1--