current-users: Re: rc.d (Was Re: run levels (was Re: The new rc.d stuff...))

Subject: Re: rc.d (Was Re: run levels (was Re: The new rc.d stuff...))
To: None <current-users@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: current-users
Date: 04/24/2000 13:37:15
[ On , April 24, 2000 at 11:11:38 (-0400), Perry E. Metzger wrote: ]
> Subject: Re: rc.d (Was Re: run levels (was Re: The new rc.d stuff...))
>
> The only real advantage of a metadaemon is it lets you automatically
> restart daemons that crash, but in my opinion, that's a crock. You
> should make sure the daemons don't crash in the first place. Sure
> enough, under NetBSD and similar systems, it is very rare to see
> daemons crash, but under AIX, it happened all the time...

That would be nice, but it just doesn't work that way yet in the real
world (yet).  If it did there wouldn't be so many hacks and tricks in so
many places for dealing with exactly this problem.  (Eg. the many
scripts and hooks in various SNMP agents that will do exactly this, not
to mention the many scripts people have written over and over again to
run under cron to do things like this.)

In fact given the current level of science and technology for software
quality assurance it's almost insane *not* to run your daemons all under
the umbrella of a small, simple, well tested, and easy to debug monitor
program that'll automatically re-start them as necessary, and which
might even restart itself (without losing state, of course! ;-)
periodically.  Systems should be engineered in such a way that failure
is expected and can be dealt with without affecting service levels.

Connection oriented service daemons have this benefit in effect now,
either through inetd starting them on demand for each connection; or by
not doing much in the parent and always forking to handle the processing
of each connection.  However there are still enough other critical
daemons that do all of their work in one process that run for the uptime
of the system and they really should be monitored by something simpler
and more reliable than they can ever be.

Perhaps the metadaemon you mention in AIX or the Service Access Facility
in SysVr4 is overly complex and perhaps ugly but that doesn't mean the
concept of having such a monitor program is bogus.  In fact the basic
facility of 'init' is all that's really needed, though some means of
telling 'init' when it should shut down a daemon, and so on is very
handy.  Indeed if you look at Solaris-2 today you'll find that 'sac'
itself is restared by 'init' any time it fails.  Unfortunately because
of the premature termination of SysVr4 development the integration of
daemons under 'init' and SAF is not complete and even a modern Solaris
system is a very bad example of how to implement this sort of thing.

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>