Subject: Re: sandbox builds + Re: Removing All Packages
To: D'Arcy J.M. Cain <darcy@NetBSD.org>
From: Douglas Wade Needham <cinnion@ka8zrt.com>
List: tech-pkg
Date: 11/06/2004 12:10:28
	version=3.0.0
Sender: tech-pkg-owner@NetBSD.org

Quoting D'Arcy J.M. Cain (darcy@NetBSD.org):
> I'm not so sure that we are that far apart in our ideas.  While I have
> dozens rather than hundreds of servers I have many of the same issues
> including multiple data centres.  I just look at one machine, my
> "staging" machine, as my sandbox.  If disaster strikes that box I am not
> dead.  I just have to rebuild.

Right now, I am only dealing with about 10 machines at home, 15 or so
at the radio observatory, and the new machines at my client. Each of
these works from their own set of sandboxes (I actually have 3
sandboxes at home, 1.6.2, 2.0RC4 and -current).  However, the key is
that my sandbox is just a bunch of directories on a disk on a machine,
and is not the machine itself.  This means that I can start totally
from scratch without killing the machine on which the sandbox
resides.  So for example, I have a machine at home which has the
following directories on a 40GB drive, wd0 mounted as /u0 (root is sd0a).

    /u0/build_200          
    /u0/destdir_200	   
    /u0/dist_200
    /u0/dist_200.local
    /u0/dist_200.pkgs
    /u0/release_200

My build.sh command (run by do_build.sh) is:

    ./build.sh -o -u -D /u0/destdir_200 -R /u0/release_200 release

Notice, I disable my objdirs with '-o' and do not do a "make clean",
because of the '-u'.  This works since I use union mounts to layer the
source below /u0/build_200 (e.g. at /u0/build_200/usr/src), and I
generally start with totally clean builds.

Once build.sh has completed, I produce dist_200 by copying stuff from
destdir_200, a few kernels from build_200, then copy in a few other
files and do some other small things such as adding users (I never
have liked YP, as I always viewed it as insecure).  This is done by
sysinst_cp.sh, which in turn calls sysinst.chroot.sh to do some funny
stuff (like adding the users).

Stage two is building the packages.  I do this by running mk_pkgs.sh,
which builds in dist_200.pkgs (created by copying from dist_200),
re-mounting source using union mounts, and then looping through
directories under pkgsrc to create packages.  This looping is
controlled by my pkglist file.  The end result for this is a fully
populated repository from which I can then use rdist to update
machines.

And like you say, if disaster strikes the box and somehow clobbers the
repository, I just rebuild.  But if I want to rebuild anyways, I am
only a few commands away, and I do not interfere with the other
operations of the machine while I am rebuilding.  For example, as I
type this message, I have a weekly rebuild from scratch running, but I
can still use mutt, emacs, tkined, gaim, mozilla and all the other
countless other things I use on my workstation.  If it completes
successfully, I will push the updated image to the machines running
from this sandbox (my web server and another bastion host, plus this
machine).  If not, I can continue to do whatever unaffected while I
fix and rerun the builds.

Longer term goal for me is to integrate these scripts a bit more, and
perhaps have a single TCL script which can do things like report
progress, detect errors (and perhaps situations like pkg X fails to
build due to errors in building pkg Y) and display them, and even to
do the cvs updates and detect and halt on conflicts...

> > - Doing an rdist to a large number of machines can eat at your network
> >   resources and does take time.  This is a larger problem when you are
> >   dealing with 1200+ machines with several hundred in locations like
> >   Munich, London and Paris and you are in Columbus, and while the
> >   rdist utility helps some, but it could be improved.
> 
> I fix this by having secondary staging machines in the remote centres. 
> My promotion script syncs up the local servers and just one server in
> each centre.  That server is used to sync the local centre.

Yea...that was what I ended up doing as well.  I just had another 4GB
(Oh, for the days when 4GB was more than enough for the OS/SW disk ;))
disk added to an admin machine in each datacenter, and pushed to
that.  I even had a wrapper script which put a lock file into place to
delay the automatic updates running from cron.

> -- 
> D'Arcy J.M. Cain <darcy@NetBSD.org>
> http://www.NetBSD.org/

-- 
Douglas Wade Needham - KA8ZRT        UN*X Consultant & UW/BSD kernel programmer
Email:  cinnion @ ka8zrt . com       http://cinnion.ka8zrt.com
Disclaimer: My opinions are my own.  Since I don't want them, why
            should my employer, or anybody else for that matter!