Re: How Unix manages processes in userland

To: Erik Fair <fair%netbsd.org@localhost>
Subject: Re: How Unix manages processes in userland
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Fri, 6 Dec 2013 08:42:19 +0000

On Thu, Dec 05, 2013 at 12:34:46PM -0800, Erik Fair wrote:
 > So, daemons - watched over or not?
 > 
 > There's been a philosophical split (sort of) between the BSD/v7
 > school of thought and the System III/V school of thought, on how
 > the Unix system should startup and manage userland processes where
 > (background) daemons are concerned, with the structure and function
 > of process 1 (whether you call that /{etc,sbin}/init or something
 > else) at the center of it.

I wouldn't go so far as to say that. A better way to describe that
split is that the AT&T unixes invented a new and monstrously complex
mechanism to monitor getty and nothing else. They *could* have used
their init arrangements to start and monitor/restart daemons, but as
you note they never did, and I don't think they ever really intended
to; they also at the same time invented a different monstrously
complex way to start up and shut down daemon processes and other
system services.

Anyhow, nowadays even most of the Linux world has realized that that
design is no good, and it's basically of no importance any more except
as a negative example.

What we do have, though, is a pile of hysterical raisins: the
init-getty-login combination works in a particular way that's been the
same all the way back to at least V7 and I think well before; init is
responsible for tidying up sessions started with getty because that
way you don't have an extra useless process hanging around underneath
the user's shell wasting memory. This mattered back then; now it's
just a poorly framed abstraction. It would be better if each getty
were just another anonymous daemon that spawned login (instead of
execing it) and cleaned up afterwards, like telnetd or rlogind or sshd
or basically everything else. (Note that if you're using PAM you get
an ill-conceived partial form of this behind your back for PAM
reasons...)

Even without that cleanup there's no reason init has to be the process
that spawns getty and cleans up after getty sessions; that work could
be farmed out to another daemon.

There are a number of recent attempts to rearrange the way services
and daemons get started (and restarted) -- there's launchd, upstart,
at least one other whose name I'm forgetting, and perhaps others. So
far, none of these has seemed to me like a very good idea; they all
seem hastily conceived (without e.g. an understanding of how
init/getty/login traditionally works) and some of them just don't seem
very ... unixish.

I think if we want to improve the state of the art in this regard the
way to do it is to look at what a "service" is (in the sense of things
like "service nfsd start", not /etc/services) and try to come up with
some abstractions that make sense and aren't oversimplified or
crippled.

Right now, for example, we have "services" that are rc.d scripts
(rpcbind, sshd, syslogd, ...) that start daemons; we also have
"services" (telnetd, fingerd) that are inetd.conf entries, although
most of these are basically dead nowadays; and there are also
"services" like ipf and npf that are rc.d scripts and behave much like
daemons except that they're really kernel state.

Most of these "services" are turned on and off via rc.conf, but not
all of them. (For example, if you want to enable fingerd, you have to
know it's an inetd service.)

Meanwhile, not all rc.d scripts are "services"; e.g. fsck isn't and
cleartmp isn't; these are purely part of the boot sequence.

In an ideal world, all "services" would be configured the same way
(whether that way is rc.conf or something else) and you wouldn't have
to know or care about the implementation to work with them.

Similarly, in an ideal world, all daemons (which might or might not be
part of a "service") would have a failure and recovery/restart path
(just respawning is not necessarily adequate) and would get run in a
framework that handles this, instead of requiring manual monitoring
and hand restart.

An ideal world would also allow users to have "services"; that is, you
log in, the system enables your talkd receiver or biff proxy or
whatever, and if it's a daemon sees to keeping it running... and shuts
it down when you log off. The absolute lack of all infrastructure
support for this in Unix is getting to be a fairly serious drawback.

We are something like 75% to 80% of the way to having a workable
abstraction for system services, but it's still too tightly coupled to
the implementation and still all mixed up with boot-time activities.

As you note, we have bupkis for daemon management, but I don't think
it makes sense to try to tackle that without fitting it into a clear
model of system services, and preferably also of user sessions.

Also, few of the daemons we commonly use have much in the way of
useful failure and recovery behavior; for many of them if you just
respawn them blindly they'll keep crashing, and most of the rest lose
all their state such that restarting them is a long way from
transparent. Some are even worse than this: in the case of syslogd, if
it crashes you (may) silently lose data, and if something respawns it
you may also lose the ability to notice that you may have lost data...

 > One more important aside that we should consider: "user sessions"
 > now come in more flavors than person pounding on a tty (pty) and a
 > shell (or three): there's FTP logins, IMAP/POP logins, and so
 > on. There have been some attempts at reflecting those in utmp(5)
 > but I don't think anyone has been consistent about it. I think we
 > ought to tie that stuff into the basic authentication libraries,
 > i.e. when a user authenticates for something, if it's going to last
 > more than a second or three (i.e. a user is asking for a
 > "session"), it ought to get an entry in utmp(5) and wtmp(5) so that
 > you can see with who(1) or w(1) the users of the system and what
 > sort of session they're in.

Yes, but more than that, there's X logins.

 > The right place to deal with all of this is in process 1.

I don't agree; to the extent init is magic, it should not do any
unnecessary work, because that exposes it to risk of failure. To the
extent init isn't magic, it doesn't need to be process 1 any more.

I've built systems where init (that is, the process that sequences
boot and shutdown) is not process 1. I've also built systems where pid
1 is reserved; if your parent exits, getppid() returns 1, but no
actual process 1 exists. Both of these things are perfectly
straightforward; there's no more reason to have a daemon hanging
around just to call wait() than there is to have a daemon hanging
around just to call nfssvc(). Less, in fact - it's fairly easy to
implement wait/exit in a way that doesn't require orphaned processes
to be waited for.

-- 
David A. Holland
dholland%netbsd.org@localhost

Follow-Ups:
- Re: How Unix manages processes in userland
  - From: Aaron B.
- Re: How Unix manages processes in userland
  - From: Roy Marples

References:
- How Unix manages processes in userland
  - From: Erik Fair

Prev by Date: Re: rm bug?
Next by Date: Re: How Unix manages processes in userland
Previous by Thread: Re: How Unix manages processes in userland
Next by Thread: Re: How Unix manages processes in userland
Indexes:

Home | Main Index | Thread Index | Old Index