Subject: Re: How to back up from one hard disk to another... ?
To: Robert Kennedy <robert@cs.stanford.edu>
From: Robert Elz <kre@munnari.OZ.AU>
List: netbsd-help
Date: 11/26/2001 19:58:54
    Date:        Sun, 25 Nov 2001 17:19:39 -0800 (PST)
    From:        Robert Kennedy <robert@theory.Stanford.EDU>
    Message-ID:  <200111260119.RAA02487@theory-lab2.Stanford.EDU>

  | The ideal thing would seem to be tar with the --newer flag, except for
  | two things:

  | 2) Thing two is a very fundamental issue that arirses no matter what
  |    software I use to back up incrementally, namely that --newer and
  |    its equivalents in other utilities are based on the modification
  |    time of the files, and files get touched, installed with times in
  |    the past, etc. Checking the inode change time (like pax -T
  |    <date>/cm does) might be a partial fix for this issue, but doesn't
  |    seem to address it completely, and isn't an option with some
  |    utilities (tar included).

What isn't addressed completely by that?   (Assuming a way to access it
exists of course)

  | An additional annoying thing about find and
  | pax is that they don't provide a convenient way to ensure that the
  | traversal doesn't cross filesystem boundaries.

Don't know about pax, but find has no problem with this (-x)

  | How does dump handle the issue of files "touch"ed into the past? In
  | other words, how does it know which files need to be backed up?

The inode changed time, which exists precisely for this purpose.

  | dump obviously isn't useful for the task I have in mind because it seems to
  | think strictly in terms of creating a monolithic archive file, not in
  | terms of copying files needing backup to individual target files on
  | the backup disk.

Actually, dump is perfect for this task - dump makes backups, it is the
only real tool dedicated to that purpose, it is almost certainly what you
should be using (rsync as suggested is another possibility, with slightly
different capabilities).

If you want the backup expanded (individual files, rather than the
monolithic file), just use dump -f - | (cd destination; restore -f -)
(and all the other options for dump of course).

The big advantage of dump is that it will allow you to keep multiple
different versions of backups, with each containing just changes since
the previous one, so if you delete some trash files today, do a
backup tonight, and then realise that the trash files were really
important files, you can still recover them from the previous day's
backup (or an earlier one if they hadn't been touched in a while).

The other big advantage is that it reads the raw filesystem - which makes
it faster (though that's just  by product really), more importantly,
the accessed time of all the files isn't altered, so you can see when
the files really were used by something last - not just when they were
backed up last...

Dump has the drawback that it really thinks it should be writing to
a tape type device, and with that kind of backup strategy, so it doesn't
have a good way of continually backing up just what was changed since the
last backup, instead it has its levels, and backs up all that was changed
since the last dump of a lower level (which can only be "yesterday" for
9 days, then you run out of levels...).   This could be faked by something
that does a little surgery on the dumpdates file though probably, provided
that you are expanding the dumps (immediately using restore) and never want
to actually restore from the (monolithic) dump output files back into the
original filesystem (peculiar things would happen),.

  | Assuming dump really has a way of knowing (in spite of touch, install,
  | tar xpf -, etc.) which files need to be backed up,

It does.

  | is there a way to get it to just list the full pathnames of those files

Not directly, though dump | restore -t would do that (a bit expensive
though).

  | so I can then
  | use tar or something similar to actually do the file copies?

Just use dump | restore

  | Got any suggestions of other schemes I could use?

raidframe was suggested - that does something quite different, as
was said, if you want the system to just keep on running, then it
is perfect, but if you really want backups (as in, I just deleted this
file, no problem, it is in the backup, I will get it back from there...)
then raidframe is useless...

  | Maybe I should just dd one drive to the other anyway, huh? But it sure
  | does take a long time (both drives live on the same SCSI adapter), and
  | it sort of scares me to do it that way (see end of previous
  | paragraph).

If the drive is full, and the two drives are close to identical, then
aside from the risk that you know about, this can be a good way.   if the
drive isn't (nearly) full, dd wastes lots of energy copying unallocated
space from one place to another...

  | Surely this is a common (and well-solved) problem. Isn't it?!?

We have had dump for backups for years.    Unless you really need to have
the files unpacked in directory trees for some reason, I'd just keep them
in files, it is much safer on the off chance that something dies during the
dump (the old backup file can still be there), you can compress those files,
nd so have more backups than you'd ever get with unpacked files (ie: you
can have your complete full filesystem backed up, and still have all the
files that you had to remove to make space for the new ones that filled it).

Extracting the files using restore is pretty easy when you need them back
again, which really, shouldn't be too often, should it?

If you do this, install amanda, use it to do the dumps, then you'll even
get indexes built so you know which file is in which dump files, so you
can more easily find the right one to restore from (which is the one problem
with lots of dump output files (or tapes) - unless you know when a file was
last altered before being destroyed, it is hard to guess which dump file to
look in to get it back).

kre