Subject: Re: cgd: booting unattended (without it)
To: None <port-i386@NetBSD.org>
From: Anne Bennett <anne@encs.concordia.ca>
List: port-i386
Date: 01/21/2007 21:20:07
On Friday morning I described this problem: 

> If my system has to boot without me present, I'd like it to come up
> but simply not configure the cgd nor mount its filesstem(s).  I'm
> making sure that nothing on the cgd is required for basic system
> operation, such that I can always come in later (remotely if
> necessary) and manually configure the cgd and mount filesystems.

I suggested possibly adding a timeout facility to "cgdconfig" to
avoid a situation where the boot sequence stalls indefinitely while
"cgdconfig" waits for a human to enter a password, or working around
the problem by wrapping its invocation in rc.d with one of the "wait
for input but time out" facilities described in:

  http://www.samag.com/documents/s=9369/sam0610j/0610j.htm

Suggestions so far include:

Steven M. Bellovin <smb@cs.columbia.edu>:
| I haven't tried it, but I wonder if there's some way to do this
| using amd.

Hauke Fath <hf@spg.tu-darmstadt.de>:
| I thought about that, too... There is a 'program' filesystem type in
| amd(8) which then calls configurable tools for {,un}mounting a filesystem.
| 
| The question here is whether cgd needs interaction with the user during
| mount; my understanding is that since amd(8) runs daemonized, whatever
| it execs is not provided with a terminal.

Alan Barrett <apb@cequrux.com>:
| I have a version of getpass() that contains an embedded HTTPS web
| server, so you can type the password either on /dev/tty or on a secure
| web page at a non-standard port.  For now, I am just offering the idea,
| not the code.  My implementation uses a modified version of shttpd.

matthew sporleder <msporleder@gmail.com>
| Shouldn't amq(8) or something like it be able to allow interaction of
| this type?

Blair Sadewitz <blair.sadewitz@gmail.com>:
| Would there be a way to encrypt the cgd key with an ssh key?  Then
| upon user authentication one could add hooks to set up the cgd which
| could then be mounted with amd (or whatever).


It's possible that "amd" is a more general solution (though indeed the
interaction with the user would presumably be the sticky part), but for
my purposes at the moment, I just wanted my use of cgd to not stall the
whole boot process.  This turned out to be surprisingly difficult to do.
My first attempt used the script "timeout_countdown" which I got from
"http://www.samag.com/documents/s=9369/sam0610j/0610j_l1.htm", but this
script uses some commands that are not in /bin and so might not always
be available with only / mounted (such as "tail"), and it expects to
be able to write to /tmp (or somewhere), which isn't possible at that
point because / is mounted read-only.  Changing the REQUIRES/PROVIDES
dependencies to move cgd after "root" introduces a circular dependency
and is clearly the wrong thing to do.  After hacking away at this for
probably three hours, trying to make it work using only the programs in
/bin, I decided that this is probably harder than it would have been to
try to patch "cgdconfig" to add a timeout!  :-/

I fell back to another method suggested in the same article, which
uses only "stty" (which *is* available in /bin) to enforce a timeout
on input.  This works (I append my modified cgd for your entertainment
or disgust, not sure which!), and an attempt to mount a filesystem on
the unconfigured device simply gives an error but doesn't interfere
with the boot sequence.  HOWEVER, "fsck" flags the filesystem
(/dev/rcgd0a) as having unexpected inconsistencies.

*** QUESTION #1: It is a bug that "fsck -p" tries to read /dev/rcgd0a
                 when cgd0 is not configured?

*** QUESTION #2: Is the idea of modifying cgdconfig to allow a
                 timeout, or of modifying /etc/rc.d/cgd to check
                 for user presence before trying cgdconfig, of
                 interest?  If the consensus is yes, I can submit
                 a PR to that effect, but if the consensus is that
                 this is not a good idea, I won't bother the good
                 folks who receive and process PRs.

Now, to continue, I have two choices:

  (a) Just turn off fsck for the cgd filesystems in fstab, and hope
      for the best, or, as I had originally thought:

  (b) Make the relevant filesystem(s) "noauto" in fstab, and add an
      rc script to (fsck and) mount those filesystems only if it can
      be determined that the cgd was configured (not sure yet how
      to do that).  In this repect, this very helpful suggestion was
      made:

          Christian Biere <christianbiere@gmx.de>:
          | You can see this with sysctl:
          | $ sysctl hw.disknames

I tried "a", and it seems to work fine, but it's a bit of a risk to
never fsck that filesystem, I fear.

My first attempt at "b" still failed with an fsck error!  It seems that
an attempt is made by "fsck -p" to fsck even a "noauto" filesystem;
I would not have expected that.  I can work around it easily enough
by turning off fsck for it in fstab, but this leads to:

*** QUESTION #3: It is a bug that "fsck -p" tries to fsck a filesystem
                 which has the "noauto" option?

And finally, yes, option "b" does work.  The system can boot while
I'm not around, and simply comes up cleanly without the cgd
filesystem.  I can then get on later and invoke
  /etc/rc.d/cgd start
  /etc/rc.d/mountcgdbypass start
and all is well.  If I'm available when the system boots, then the 
system comes up with all filesystems available.

Again for your entertainment, I append /etc/rc.d/mountcgdbypass, even
though it's a bletcherous hack and inefficient to boot (pun not
intended!).  It seems to me that it shouldn't be necessary, in that
if "fsck -p" properly (IMHO) ignored filesystems on unavailable
devices, that whole script would be unnecessary.  But I suppose it's
arguable whether such a change to fsck would be correct, and anyway, I
finally have my system able to boot unattended, so I'm happy enough
and can continue with the set-up.

Masses of software compilations, here I come!


Uh-oh; I just tried to "shutdown -r", and saw this:

   unmounting filesystems... done
   panic: wdc_exec_command: polled command not done
   Stopped in pid 179.1 (reboot) at   netbsd:cpu_Debugger+0x4:  leave

I'm not much good with a debugger, but I did invoke "trace", and got:
[too lazy to actually copy out the numbers]

  cpu_Debugger([numbers]) at netbsd:cpu_Debugger+0x4
  panic([numbers]) at netbsd:panic+0x11d
  wdc_exec_command([numbers]) at netbsd:wdc_exec_command+0x1b6
  wd_flushcache([numbers]) at netbsd:wd_flushcache+0x91
  wd_shutdown([numbers]) at netbsd:wd_shutdown+0x10
  doshutdownhooks([numbers]) at netbsd:doshutdownhooks+0x2a
  cpu_reboot([numbers]) at netbsd:cpu_reboot+0x69
  sys_reboot([numbers]) at netbsd:sys_reboot+0x46
  syscall_plain() at netbsd:syscall_plain+0x1a5
  --- syscall (number 208) ---
  0xbdb3a60b

Is this because I failed to unmount my "noauto" cgd filesystem?
Should not be: "umount -a" claims to unmount "all the currently
mounted filesystems except the root".  Sigh.  Hope it doesn't
happen again...


Anne.
-- 
Ms. Anne Bennett, Senior Sysadmin, ENCS, Concordia University, Montreal H3G 1M8
anne@encs.concordia.ca                                    +1 514 848-2424 x2285
-------------------------------------------------------------------------------
#!/bin/sh
#
# $NetBSD: cgd,v 1.5 2005/03/02 19:09:22 tv Exp $
#

# PROVIDE: disks

# 2007/01/21 Anne Bennett: proceed only if user standing by with password.
#  This avoids indefinite boot hang if no on is there.
#  WARNING: make sure no attempt is made to "fsck" filesystems on
#  potentially unconfigured devices.
timeout=15

$_rc_subr_loaded . /etc/rc.subr

name="cgd"
rcvar=$name
start_cmd="cgd_start"
stop_cmd=":"

cgd_start()
{
	if [ -f /etc/cgd/cgd.conf ]; then
		answer=""
		echo    "Are you ready to configure the CGD?"
		echo -n "Please respond within $timeout seconds (y or n): "

  		# Timeout for stty is in tenths of a second:
  		stty_to=`expr 10 \* $timeout`
  		# Keep old tty settings:
  		oldtty="`stty -g`"
  		# Turn off canonical mode, turn off signals, enforce timeout:
  		stty -icanon min 0 time $stty_to -isig
  		# Try to get user input:
  		read answer
  		# Put terminal back to normal:
  		stty "$oldtty"

		if [ -z $answer ] ; then 
  			echo "No response within $timeout seconds; skipping CGD configuration."
		else
  			if expr "$answer" : "[Yy]" >/dev/null ; then
				echo "Configuring CGD devices."
				cgdconfig -C
  			else
    				echo "Okay, skipping CGD configuration."
  			fi
		fi 
	
	fi
}

load_rc_config $name
run_rc_command "$1"
-------------------------------------------------------------------------------
#!/bin/sh
#
# $NetBSD: mountcgdbypass,v 1.0 2007/01/21 15:00:00 anne Exp $
#

# REQUIRE: disks
# BEFORE:  DAEMON

$_rc_subr_loaded . /etc/rc.subr

name="mountcgdbypass"
start_cmd="mountcgdbypass_start"
stop_cmd=":"

mountcgdbypass_start()
{
	#	Mount `cgd' filesystems specified in $cgd_bypass_noauto,
	#	but only if their cgd is configured.
	#

	if [ -z "$cgd_bypass_noauto" ]; then
		exit
	fi

	# Of the specified list, keep only those not already mounted.
	mount_candidates="`
	for _fs in $cgd_bypass_noauto; do
		mount | (
			_ismounted=false
                        while read what _on on _type type; do
                                if [ $on = $_fs ]; then
                                        _ismounted=true
                                fi
                        done
                        if $_ismounted; then
                                :
                        else
                               	echo $_fs
                        fi
		)
	done`"
	mount_candidates="`echo $mount_candidates`"

	if [ -z "$mount_candidates" ]; then
		exit
	fi

	# Of the remaining candidates, keep only those whose dev is configured.
        configured_disks="`sysctl -n hw.disknames`"
	ready_candidates="`
	for disk in $configured_disks ; do
		( while read fs mtpt type opt fsck1 fsck2 ; do
			case $fs in
				/dev/$disk[a-p])
					if expr \" $mount_candidates \" : \".* $mtpt \" > /dev/null ; then
                               			echo $mtpt
					fi
					;;
				*)
					;;
			esac
		done ) < /etc/fstab
	done`"
	ready_candidates="`echo $ready_candidates`"


	# For the remaining candidates, fsck and mount if okay
	for fs in $ready_candidates ; do
		echo Checking and mounting bypassed CGD filesystem $fs
		fsck -p $fs && mount $fs
	done

}

load_rc_config $name
run_rc_command "$1"
-------------------------------------------------------------------------------