Subject: Re: maintaining many Xen VMs
To: Steven M. Bellovin <smb@cs.columbia.edu>
From: Johnny Lam <jlam@pkgsrc.org>
List: port-xen
Date: 10/23/2006 12:01:53
This is a multi-part message in MIME format.
--------------050006080904080207080905
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Steven M. Bellovin wrote:
> I recently brought up 3 DomUs; the result, of course, is that I now have 3
> more machines to administer.  This is known as a bad tradeoff...  I'm
> curious, though, how other people are solving this.
> 
> My (NetBSD) DomUs are going to be mostly identical.  I was thinking of
> having a shared, read-only /usr and separate /var.  I probably need
> separate roots, if only to have separate /etc/rc.conf files.  /usr will be
> a real partition, probably shared with the Dom0.  The Dom0 would also have
> a separate partition that held the vnds for the DomUs.
> 
> The problem is updating pkgsrc -- compilations on the Dom0 (with the DomUs
> shut down) would be slow, since I'm not allocating much phyiscal memory to
> Dom0.
> 
> Anyway -- how are other people handling this?  I thought about NFS, but I
> suspect it's too slow.

I use disk images mounted on vnd devices for my domUs.  My domUs are 
mostly the same, so what I do is:

    1) Have one read-only root filesystem image that has a full NetBSD
       installation (about 400Mb).

    2) Create a separate disk image for the /usr/pkg filesystem per domU.

    3) Create a separate disk image for the /local filesystem per domU.
       These vary in size depending on the domU and provide local
       storage space.

    4) Null-mount /local/etc and /local/var over /etc and /var during
       startup.

    5) Use MFS for the /dev mount.

This lets you:

    * Update the base install of NetBSD across all your domUs at once by
      swapping in a new root.img.  This is mostly only useful if you're
      planning on tracking a minor or teeny release, not a major upgrade.

    * Swap out the pkg.img with one with newer packages on it, and
      quickly swap back when stuff breaks.

    * Creatively make /local into a cgd filesystem, so only the local
      data is encrypted while the packages and root filesystem aren't.

There are some modifications that need to be made to /etc in the 
root.img file and also to /local/etc in the local.img files.  I've 
attached some quick-and-dirty scripts that I use to set up my domUs.  I 
offer them without much explanation, so you'll have to read through them 
to figure out what's going on.

	Cheers,

	-- Johnny Lam <jlam@pkgsrc.org>

--------------050006080904080207080905
Content-Type: text/plain;
 name="setup_root"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="setup_root"

#!/bin/sh

block_file=/usr/pkg/etc/xen/block-file
img=
kernel=netbsd-XENU
tmpdir=/tmp/$$

# Parse options
while [ $# -gt 0 ]; do
	case "$1" in
	-k)	kernel=$2; shift 2 ;;
	-*)	echo 1>&2 "$0: unknown option \`\`$1''"; exit 1 ;;
	*)	break ;;
	esac
done

if [ $# -lt 1 ]; then
	echo 1>&2 "$0: missing image file"
	exit 1
fi

img="$1"

###
### Check for necessary files before starting.
###
if [ ! -x "$block_file" ]; then
	echo 1>&2 "$0: $block_file cannot be executed"
	exit 1
fi
if [ ! -f "$img" ]; then
	echo 1>&2 "$0: missing $img";
	exit 1
fi
if [ ! -f "$kernel" ]; then
	echo 1>&2 "$0: missing $kernel";
	exit 1
fi

###
### Mount root.img onto /mnt for manipulation.
###
echo "Mounting $img"
dev=`$block_file bind $img`
vnd="${dev#/dev/}"; vnd="${vnd%[a-z]}"
mnt=$tmpdir/mnt
mkdir -p $mnt
mount /dev/${vnd}a $mnt

###
### Put the XENU kernel into place.
###
echo "Copying XENU kernel into $img"
rm -f $mnt/netbsd
cp -f $kernel $mnt
case "$kernel" in
netbsd) ;;
*)	ln -f $mnt/$kernel $mnt/netbsd ;;
esac

###
### Prepare for MFS /dev.  Modify the "init" target of the MAKEDEV script
### to create a few more virtual disk devices and also the power devices.
###
echo "Preparing MFS /dev"
makedev=$mnt/dev/MAKEDEV
if [ -f $makedev ]; then
	rm -rf $mnt/dev/[a-z]*
	[ -f $makedev.orig ] || mv -f $makedev $makedev.orig
	awk '/makedev xbd0 xbd1 xencons/ {
		sub("xencons", "xbd2 xbd3 xbd4 xencons");
		print $0;
		print "	makedev sysmon";
		print "	makedev clockctl";
		print "	makedev ipl pf crypto systrace";
		print "	makedev tun0 tun1 tun2 tun3";
		print "	makedev tap tap0 tap1 tap2 tap3";
		print "	makedev kttcp";
		next;
	     }
	     /makedev st0/ {
		print "	makedev vnd0"
		next
	     }
	     /makedev iop0/ { next }
	     /makedev ed0/ { next }
	     /makedev ld0/ { next }
	     { print }' \
		$makedev.orig > $makedev
	[ ! -x $makedev.orig ] || chmod +x $makedev
fi

###
### Create directories on which we intend to mount additional filesystems.
###	emul	emulation shadow directories
###	local	filesystem mounted on cgd(4) device	
###	usr/pkg	mount point for pkgsrc-installed software
###
echo "Populating filesystem directories"
( cd $mnt && mkdir -p emul local usr/pkg )

###
### Add hook to mount /local/etc onto /etc so that our system-specific
### configuration is used.  We insert the hook at the start of the rc
### script just before we source any other files.
###
echo "Creating local rc hook"
cat > $mnt/etc/rc.pre-hooks << 'EOF'
# rc.pre-hooks

# Mount /local to get the local configuration data.  We must fsck the
# partition beforehand to ensure that it's clean because we won't be
# able to fix any problems later on after other filesystems are mounted.
#
/sbin/fsck -p /dev/rxbd2a
case $? in
0)	;;
*)	if [ "$1" = autoboot ]; then
		kill -TERM $$
	fi
	exit 1
	;;
esac
if ! mount -t ffs /dev/xbd2a /local; then
	echo "Unable to mount /local.  Multiuser boot aborted."
	exit 1
fi

# Mount /etc to get the real configuration files.
if ! mount -t null /local/etc /etc; then
	echo "Unable to mount /etc.  Multiuser boot aborted."
	exit 1
fi

# Re-exec /etc/rc so that we use the correct configuration information,
# e.g., /etc/rc.conf settings, etc.
#
exec /bin/sh $0
EOF

rc=$mnt/etc/rc
if [ -f $rc ] && ! grep -q "/etc/rc.pre-hooks" $rc; then
	mv -f $rc $rc.orig
	awk '/^\. \/etc\/rc.subr$/ {
		print ". /etc/rc.pre-hooks"
	     }
	     { print }' \
		$rc.orig > $rc
	[ ! -x $rc.orig ] || chmod +x $rc
fi

###
### Cleanup
###
echo "Unmounting $img"
umount $mnt
rmdir $mnt
rmdir $tmpdir
$block_file unbind $dev

--------------050006080904080207080905
Content-Type: text/plain;
 name="setup_local"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="setup_local"

#!/bin/sh

block_file=/usr/pkg/etc/xen/block-file
block_cgd=/usr/pkg/etc/xen/block-cgd
img_root=
img_local=
keyfile=
tmpdir=/tmp/$$

# Parse options
while [ $# -gt 0 ]; do
	case "$1" in
	-c)	keyfile="$2"; shift 2 ;;
	-*)	echo 1>&2 "$0: unknown option \`\`$1''"; exit 1 ;;
	*)	break ;;
	esac
done

if [ $# -lt 2 ]; then
	echo 1>&2 "$0: missing image files"
	exit 1
fi

img_root="$1"
img_local="$2"

###
### Check for necessary files before starting.
###
if [ ! -x "$block_file" ]; then
	echo 1>&2 "$0: $block_file cannot be executed"
	exit 1
fi
if [ -n "$keyfile" -a ! -x "$block_cgd" ]; then
	echo 1>&2 "$0: $block_cgd cannot be executed"
	exit 1
fi

###
### Check for necessary files before starting.
###
if [ ! -f "$img_root" ]; then
	echo 1>&2 "$0: missing $img_root";
	exit 1
fi
if [ ! -f "$img_local" ]; then
	echo 1>&2 "$0: missing $img_local";
	exit 1
fi

###
### Mount root.img onto /mnt.
###
echo "Mounting $img_root"
dev_root=`$block_file bind $img_root`
vnd_root="${dev_root#/dev/}"; vnd_root="${vnd_root%[a-z]}"
mnt_root=$tmpdir/mnt
mkdir -p $mnt_root
mount -r /dev/${vnd_root}a $mnt_root

###
### Mount local image onto /local.
###
echo "Mounting $img_local"
dev_local=`$block_file bind $img_local`
vnd_local="${dev_local#/dev/}"; vnd_local="${vnd_local%[a-z]}"
if [ -n "$keyfile" ]; then
	dev_cgd=`$block_cgd bind /dev/${vnd_local}a $keyfile`
	cgd="${dev_cgd#/dev/}"; cgd="${cgd%[a-z]}"
fi

mnt_local=$tmpdir/local
mkdir -p $mnt_local
if [ -n "$keyfile" ]; then
	mount /dev/${cgd}a $mnt_local
else
	mount /dev/${vnd_local}a $mnt_local
fi

###
### Copy "etc", "tmp" and "var" into place.
###
echo "Copying mutable directories from $img_root to $img_local"
( cd $mnt_root && pax -rwpe etc tmp var $mnt_local/. )

###
### Restore the original /etc/rc script.
###
echo "Restoring vanilla rc script"
if [ -f $mnt_local/etc/rc.orig ]; then
	rm -f $mnt_local/etc/rc.pre-hooks
	mv -f $mnt_local/etc/rc.orig $mnt_local/etc/rc
fi

###
### Add configuration bits to re-mount /local as read-write from
### /etc/rc.d/root.
###
echo "Add bit to re-mount /local as read-write"
cat > $mnt_local/etc/rc.conf.d/root << 'EOF'
start_postcmd="root_poststart"

root_poststart()
{
	# Re-mount /local as a read-write filesystem.
	mount -uw /local
}
EOF

###
### Add additional filesystems into /local/etc/fstab.
###
echo "Add our filesystems to /etc/fstab"
fstab=$mnt_local/etc/fstab
[ -f $fstab.orig ] || mv -f $fstab $fstab.orig
cat > $fstab << 'EOF'
/dev/xbd0a / ffs rw 1 1
/dev/cgd0b none swap sw 0 0
/dev/xbd2a /local ffs rw 0 0
/dev/xbd3a /usr/pkg ffs rw 1 2
/local/etc /etc null rw
/local/tmp /tmp null rw
/local/var /var null rw
kernfs /kern kernfs rw
procfs /proc procfs rw,noauto
EOF

###
### Edit the vanilla rc.conf script to add the extra bits for the domU setup.
###
echo "Modify rc.conf for domU setup"
rc_conf=$mnt_local/etc/rc.conf
if [ -f $rc_conf ] && ! grep -q "critical_filesystems_local" $rc_conf; then
	mv -f $rc_conf $rc_conf.orig
	awk '/^wscons=/ {
		print "critical_filesystems_local=\"/local /etc /tmp /var\"";
		print "powerd=YES";
		print "savecore=NO";
		print "sendmail=NO";
		print "wscons=NO";
		next;
	     }
	     { print }' \
		$rc_conf.orig > $rc_conf
fi

###
### Set up encrypted swap.
###
echo "Set up encrypted swap for the domU"
cat > $mnt_local/etc/cgd/cgd.conf << 'EOF'
cgd0	/dev/xbd1a
EOF

cat > $mnt_local/etc/cgd/xbd1a << 'EOF'
algorithm blowfish-cbc;
iv-method encblkno;
keylength 128;
verify_method none;
keygen urandomkey;
EOF

cat > $mnt_local/etc/rc.conf.d/cgd << 'EOF'
swap_device="cgd0"
swap_disklabel="/etc/cgd/xbd1a.disklabel"
start_postcmd="cgd_swap"

cgd_swap()
{
	# Convert this dedicated swap device to contain one big swap
	# partition.
	#
	disklabel $swap_device 2>/dev/null |
	while read line; do
		case "$line" in
		d:*)	bline=" b:${line#d:}"
			bline="${bline%%unused*}  swap${bline##*unused}"
			echo "$bline"
			echo "$line"
			;;
		[a-z]:*) ;;
		*)	echo "$line"
			;;
		esac
	done > $swap_disklabel 
	if ! disklabel -R -r $swap_device $swap_disklabel 2>/dev/null; then
		echo 1>&2 "Could not write $swap_disklabel to $swap_device"
	fi
}
EOF

###
### Only turn on the console tty.
###
echo "Disable non-console ttys"
ttys=$mnt_local/etc/ttys
[ -f $ttys.orig ] || mv -f $ttys $ttys.orig
sed -e "/^tty/{s,on secure,off secure,;}" $ttys.orig > $ttys

###
### Disable the daily and weekly checks in root's crontab.
###
echo "Disable daily and weekly checks in root's crontab"
crontab=$mnt_local/var/cron/tabs/root
awk '/^[^#].*\/etc\/(daily|weekly)/ { print "#" $0; next } { print }' \
	$crontab > $crontab.new
mv -f $crontab.new $crontab

###
### Cleanup
###
echo "Unmounting $img_root and $img_local"
umount $mnt_root
umount $mnt_local
rmdir $mnt_root
rmdir $mnt_local
rmdir $tmpdir
$block_file unbind $dev_root
[ -z "$keyfile" ] || $block_cgd unbind $dev_cgd
$block_file unbind $dev_local

--------------050006080904080207080905--