Subject: How to back up from one hard disk to another... ?
To: None <netbsd-help@netbsd.org>
From: Robert Kennedy <robert@theory.Stanford.EDU>
List: netbsd-help
Date: 11/25/2001 17:19:39
I'm seeking advice on something that should be very simple. Before I
thought about it I expected I could get this job tackled in about 15
minutes, but it's consumed several hours so far and it still isn't
done. I basically don't know how to do it, even after reading quite a
large number of man pages.

The idea is this: I have two disk drives that are identical except for
the data stored on them. One of them is my main drive, and the other
is my backup drive. Periodically I would like to back up the main
drive to the backup drive.

Fine, I can just dd one drive to the other. Indeed, that seems to be
the only option that will really work, but I was hoping for something
that doesn't have to read and write the entire disk. Plus this is a
scary thing to do, because it means I don't have a usable backup at
all while the dd is in progress. I would like a way to back up just
those files that have changed (or been created, or been deleted) since
the last time I did a backup, leaving the backup filesystem consistent
(except perhaps for whatever file is being copied) while the backup is
happening.

The ideal thing would seem to be tar with the --newer flag, except for
two things:

1) Thing one is that the date-string-parsing stuff in tar doesn't seem
   to work. I can't figure out a format it will accept for the date. I
   suspect this might be a Y2K thing that has been fixed; I'm running
   NetBSD 1.3 on i386, in case it matters. Thinking this was probably
   the case, I downloaded the gtar-1.12 package for NetBSD 1.3.3/i386
   (should be close enough, right?), and it, too, fails to understand
   --newer correctly (although the failure mode is slightly
   different).

2) Thing two is a very fundamental issue that arirses no matter what
   software I use to back up incrementally, namely that --newer and
   its equivalents in other utilities are based on the modification
   time of the files, and files get touched, installed with times in
   the past, etc. Checking the inode change time (like pax -T
   <date>/cm does) might be a partial fix for this issue, but doesn't
   seem to address it completely, and isn't an option with some
   utilities (tar included).

So then I thought, OK, I'll write a find command to list all the new
files that need to be backed up, and then tar those files explicitly
over to the backup disk. This sort of works, but it has two
bugs. First, there isn't a convenient way to get find to exclude
nonempty directories from its output list. At least I couldn't figure
out a way. The reason I want this is that empty directories should be
listed in the set of files for tar to archive, but nonempty
directories should *not* be listed. The second with this idea is the
same as thing two above: There's no way to pick up files that have
been created or modified or installed recently, but that have been
"touch"ed into the past. An additional annoying thing about find and
pax is that they don't provide a convenient way to ensure that the
traversal doesn't cross filesystem boundaries. tar does provide that,
but as we've seen, tar is unusable because --newer apparently doesn't
work.

How does dump handle the issue of files "touch"ed into the past? In
other words, how does it know which files need to be backed up? dump
obviously isn't useful for the task I have in mind because it seems to
think strictly in terms of creating a monolithic archive file, not in
terms of copying files needing backup to individual target files on
the backup disk.

In summary:

Can someone show me how to use tar --newer?

Assuming dump really has a way of knowing (in spite of touch, install,
tar xpf -, etc.) which files need to be backed up, is there a way to
get it to just list the full pathnames of those files so I can then
use tar or something similar to actually do the file copies?

Got any suggestions of other schemes I could use? Keep in mind that
the scheme should behave correctly with hard links (which, under some
circumstances, tar and pax and presumably dump do), should work for
empty directories, and should provide a way to copy exactly those
things that have been changed since the last time a backup was
done. It should also have the property that if something goes terribly
wrong during course of a backup and the main drive gets destroyed, the
backup drive can be left in a state self-consistent enough to be
recovered with fsck. The "dd" solution lacks this property, and that's
a big problem.

Maybe I should just dd one drive to the other anyway, huh? But it sure
does take a long time (both drives live on the same SCSI adapter), and
it sort of scares me to do it that way (see end of previous
paragraph).

Surely this is a common (and well-solved) problem. Isn't it?!?

Gratefully looking forward to any advice.

	-- Robert Kennedy