Subject: Re: sup hell
To: None <current-users@NetBSD.ORG>
From: der Mouse <mouse@Collatz.McRCIM.McGill.EDU>
List: current-users
Date: 11/10/1995 11:15:44
> Sup seems to be causing some problems lately, for varying reasons.
> [...suggestion...]

While not wishing to detract from this suggestion, it's perhaps time to
drag out something I wrote quite a while ago.  I call it compare.  It
was originally designed to check diskless root areas against one
another, to make sure they hadn't been diverging too far from the
original template, like a multi-directory diff -r, except that it just
detects and mentions files that differ, it doesn't diff them.  But I
added an option to allow it to treat one of the directories as a master
and update the rest to look just like it, and that sounds a lot like
what we want here.

My current implementation would not be suitable as-is, mostly because
it would involve much unnecessary work on the sup server (the current
code would be constantly recomputing checksums for the same files; they
should be cached), and for at least two minor reasons (it's not
prepared to be set up as a daemon to accept multiple connections, and
it would have to have certain code disabled to ensure clients couldn't
scribble on the master tree).

It would, though, handle directory deletion and symlinks easily; it
already does them right.

Here's the comment header from the current code.  I'll think about what
would be needed to turn it into a sup-alike, and probably implement
some of it anyway, since it would be useful for my purposes.

 * compare - compare file hierarchies
 * compare [flags] [[user@]machine[!program]:]directory
 *				[[user@]machine[!program]:]directory ...
 * flags (old forms still accepted)
 *	-update (old form: -u)
 *	-follow (old form: -h)
 *	-sgid-dirs (old form: -g)
 *	-mtimes (old form: -t)
 *	-no-owners (old form: -o)
 *	-encode (old form: -x)
 *	-no-sparse
 *	-trace
 *	-prune <name>
 *	-prunewild <pattern>
 *	-prunex
 *	-md5
 *	-gzip <gzip-arg>
 *	-accept <port-number>
 *	-connect <dotted-quad> <port-number>
 *	-R
 * Compare compares directories, possibly on different machines, with
 *  one another.
 * Each (non-option) argument specifies a directory.  If no :s occur in
 *  the argument, or if a slash appears before the first : or !, the
 *  directory is on the local machine.  Otherwise, the portion before
 *  the colon specifies a machine name with optional user and program
 *  specifications.  Compare uses rsh(1), which must be in the path, to
 *  run compare on the remote machine.  If a username is given with an
 *  @ before the machine name, it will be used with rsh's -l option to
 *  specify the remote username.  If a program name is specified after
 *  the machine name with a !, that will be the name used to run
 *  compare on the remote machine.  The default for the username is
 *  whatever rsh feels like using (normally the local username); the
 *  default for the program name is the argv[0] value for the master
 *  run.
 * If the machine name is an empty string (ie, the argument begins with
 *  a colon), the argument is taken to be of the form
 *	:keyword:info
 *  to specify an alternative way of contacting the remote.  Currently,
 *  the only defined ways are
 *	:connect:dotted-quad:port-number:directory
 *  and
 *	:accept:port-number:directory
 *  These cause compare to do a connect or a listen/accept with the
 *  given port number (and address, for "connect") to establish the
 *  connection to the remote compare process.  This is useful when you
 *  have shell access on two machines but can't (or don't want to)
 *  allow rsh access between them.  See the -R, -accept, and -connect
 *  options for more.  Note that certain options must normally agree
 *  between the master and the remotes, either given to both or given
 *  to neither.  These options are -mtimes and -encode.  In addition,
 *  -gzip must be given to the remote if it's given to the master,
 *  though the converse is not true (if the master is not given -gzip,
 *  it will not matter whether the remotes are or not).  Other flags
 *  (notably -follow, -no-owners, and -sgid-dirs) may usefully vary
 *  between the master and the remotes.
 * -update means to update: the first-named directory is treated as a
 *  master copy and all others are made identical to it, like a
 *  super-picky rdist (compare compares more things than rdist does).
 * -follow means to follow symlinks instead of checking the links
 *  themselves for matching.
 * -sgid-dirs says to ignore the set-group-ID bits on directories when
 *  comparing modes and to leave them alone when setting modes.
 * -mtimes says that modification times are important and should be
 *  considered when comparing for differences and preserved when
 *  updating.
 * -no-owners says that the owning UID and GID values for files are
 *  unimportant and should not be set or compared.
 * -encode says that network communication should be passed through a
 *  simple binary-to-printable filter.  (This _should_ never be needed,
 *  but the world is far from ideal, and it does seem to help in some
 *  cases.)
 * -no-sparse says that files should never be created sparse.  Without
 *  this, files compare creates will always be as sparse as possible.
 * -trace says to trace data sent and received between the master
 *  process and the auxiliaries.  (If -encode is in effect, this traces
 *  the encoded form as well.)  Trace output is sent to stderr.
 * -prune <name> says that anything called <name> (which should be a
 *  path relative to the directories being compared) should be ignored
 *  when found: neither it nor anything under it should be compared,
 *  updated, or otherwise touched.  compare behaves as if no such names
 *  existed anywhere, except that if a directory is copied wholesale
 *  because it was nonexistent during an update operation, -prune
 *  entries referring to things inside that directory will not limit
 *  what is copied.
 * -prunewild <pattern> is just like -prune except that the argument is
 *  a globbing pattern rather than a simple name.
 * -prunex says that that the sense of the tests used for pruning
 *  should be reversed: everything is pruned _except_ the things named
 *  with -prune or -prunewild arguments.
 * -md5 says that when plain files contents' need to be compared,
 *  rather than comparing the full contents, simply compute an md5
 *  checksum of each file, and assume that matching checksums means
 *  matching contents.  (This is useful when running over a network
 *  link sufficiently slow that it takes longer to send the file than
 *  it does to compute its checksum, and you're willing to take the
 *  slight risk that different files will checksum the same.)
 * -gzip says that when copying files from one place to another, the
 *  files are to be gzipped at the point of reading and gunzipped at
 *  the point of writing.  (The intention is to reduce bytes sent over
 *  the net; this is intended for use across slow links, where cpu
 *  cycles are cheaper than network bytes.)  The argument is passed to
 *  gzip; it is expected to be something like --best or --fast.
 * -R says that this compare process is a remote.  It is normally not
 *  needed unless you're using explicit rendezvous points instead of
 *  the usual rsh way of starting the remotes.
 * -accept is useful only with -R.  It specifies that the process is to
 *  listen/accept on the given port number to establish the connection
 *  to the master, instead of assuming the connection is already
 *  present on stdin and stdout.  -R -accept is used when the master
 *  uses a :connect:-style remote specifier.
 * -connect is useful only with -R.  It specifies that the process is
 *  to connect to the given port number at the given address to
 *  establish the connection to the master, instead of assuming the
 *  connection is already present on stdin and stdout.  -R -connect is
 *  used when the master uses a :accept:-style remote specifier.
 * No provision is made for user names containing /s, @s, or :s,
 *  machine names containing /s, !s or :s, or program names containing
 *  :s.

					der Mouse