Subject: Re: PR 7170 -- init and /dev/console
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 04/30/2001 03:01:19
[ On Sunday, April 29, 2001 at 20:52:54 (-0700), Charles M. Hannum wrote: ]
> Subject: Re: PR 7170 -- init and /dev/console
>
> I'm just going to let you ponder that.  But I will state flat out that
> you've missed something of critical importance.

I'm not so sure....

> > Actually why does
> > `init' even need /dev/console during multiuser mode?  It should only
> > syslog any /etc/ttys-related errors (and *not* to /dev/console!).
> 
> It doesn't.  And in fact you'll notice that AFTER /etc/rc runs, it even
> closes /dev/console

Ah ha!!!  I hadn't noticed that yet!  I'm glad to see my thoughts are
indeed on the right track!  ;-)

> and only opens it again if it's listed in /etc/ttys
> and enabled.

In theory at least it wouldn't take too much effort to stop having init
open ttys and instead force users to specify the tty name as one of the
options for getty -- but that's a separate issue....  ;-)

(that was one of the "fixes" I really liked about SysV's init, btw)

>  However, /etc/rc NEEDS to have /dev/console open -- and
> properly revoke(2)ed -- or you can't ^C hung processes during startup.

Well, first off for ^C to work you don't need to have some file called
"/dev/console" open....  (Personally I strongly prefer ^Z so that I can
diagnose the problem after the fact....  ;-)

Indeed if something's broken in /dev (eg. some idiot was playing around
with things that shouldn't have been played with) it's conceivable that
opening /dev/console will not connect /etc/rc to the device being used
as the kernel's console anyway.  Not doing what we're talking about here
makes correcting such mistakes a great deal harder than it should be.

As for calling revoke(2) on whatever was used to connect /etc/rc and its
sub-processes with a controlling tty device, well there's no reason I
can see why revoke() won't work on the file descriptor handed to `init'
by the kernel.

In fact there should be literally no distinguishable difference between
a file descriptor opened (as the first tty device open) on /dev/console,
and the kernel console file descriptor handed to `init' before it's
exec'ed -- that's the reason for doing it that way in the first place,
after all.

Finally I'll note that strictly speaking no long-running process started
by /etc/rc should ever remain attached (in a controlling terminal sense)
to the console anyway.  None should ever call daemon() with a non-zero
value for the `noclose' parameter.  That doesn't mean `init' shouldn't
call revoke() too, of course -- there are always bound to be badly
written daemons in use out there....

> This is one of the things that always pisses me off about Linux.

I didn't even know Linux had that problem...   :-)

Certainly SysV had that problem, though not for any really good reason
that couldn't have been fixed in the same way I'm contemplating here.
Indeed as I described earlier the SysV folks were even half-way towards
a solution since they already have a way of sending a signal from the
real kernel console to init regardless of what virtual console it
happened to believe it was using at the time.

> > It
> > only needs to open /dev/console when it receives a SIGTERM and enters
> > state '7'.
> 
> You must be living in a fantasy world, since "state '7'" doesn't exist
> in this one.

Sorry, I should have been more explicit and described that state with a
description [from init(8)], not just a reference number:

     7.   Shutdown mode.  Send SIGHUP to all controlling processes, reap the
          processes for 30 seconds, and the go to state 1 (single user); warn-
          ing if not all the processes died.

> > > Plus, in anything other than a diskless environment, if your /dev is
> > > toast you can't upgrade / to r/w to fix it.
> > 
> > Huh?  That's only true if *all* of your /dev is toast.
> 
> Have you EVER seen a case where /dev/console was missing but disk
> devices were present?  I have no doubt that you'll say `yes', but *I*
> haven't, ever, under any circumstance.  The only case in which it's
> even really plausible is file system corruption.

Just for fun I'll say "no, but..."  :-)

Remember that users with fat fingers (and/or with misconceptions) can
cause infinitely more interesting scenarios than random corruption!  :-)

> In other words, if we did a bunch of half-assed hacks to deal with a
> case that never happens in practice.  No thanks.

Having process #1 (i.e. `init', regardless of what binary it is) depend
on opening some file called "/dev/console" is what is a hack.  It really
should be attached in the normal stdio fashion to whatever the kernel
happens to be using as its current console device.

Anyway the point is that the "half-assed hacks" shouldn't be prevented
merely out of spite -- they are what stands between the possible and the
impossible after all.

> If we want to solve the `/dev problem' for good, then we need a devfs.

Maybe.  Once upon a time I would have said "YES!!!", but I'm not so sure
any more....  /dev is to device drivers as the directory structure is to
inodes.  That the former is layerd over the latter is not necessarily a
bad thing, especially since the effect is that it provides for a very
necessary mechanism to save ownership and permissions information in a
persistent inode-level filesystem, as well as of course providing for a
convenient layer of naming indirection that's stored in the directory
structure.  In that sense there is no layering violation since
everything's done at the appropriate level.

What really bugs me in the current scheme of /dev is that the MAKEDEV
script and /usr/sbin/config don't share a common source of information
to map devices to their device numbers.  It would be really nice to have
a common table that allowed the actual numbers to shuffle about without
having to edit their mappings in two separate places.  Another layer of
naming isn't really necessary -- their numbers suffice and their names
in the kernel config tree can be used in the MAKEDEV program to map to
to their filesystem names.

But I seem to have digressed....

> Doing a bunch of little hacks that give you a totally obscure recovery
> path is a suboptimal non-solution.

Obscurity is in the eye of the beholder.  Appropriate documentation
should clarify things.  (and of course the MFS trick is already
documented in some sense of the word, i.e. in i386/floppies/ramdisk-big/
dot.profile)

> > (*) I don't see any reason to prevent a read/write mount of any
> > read-only mounted filesystem as is the case today, particularly not in
> > single-user mode.  The buffer cache should keep the two views of the FS
> > in sync.
> 
> Bzzzt.  You forget the inode cache and the name cache.

Isn't the inode cache tagged by major,minor and if so shouldn't mount
aliasing be irrelevant to it?  (That's one of the areas of the kernel of
which I am most definitely not intimate with....)

Maybe the name cache causes problems with aliased mounts, though will it
if one is read-only?  (I'm also not incredibly intimate with the name
cache either.... :-)

In any event scratch my last suggestion there -- just creating any
/foo/bar name, on any writable filesystem mounted on /foo, as a block
device with the root device's major,minor and issuing "mount /foo/bar /"
will suffice to work around your primary objection.  Now let us now get
on to re-implementing your prior work and documenting how to make use of
this new facility in the event of an emergency (i.e. so that we can get
beyond the mere elegance of having the kernel tell `init' what the
active console device happens to be)!

Or do you still have a copy of your original implementation somewhere?

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>     <woods@robohack.ca>
Planix, Inc. <woods@planix.com>;   Secrets of the Weird <woods@weird.com>