Subject: Re: Backup: 'tar' or 'dump'?
To: Jon Ribbens <jon@oaktree.co.uk>
From: Jonathan Marsden <Jonathan@XC.Org>
List: netbsd-help
Date: 03/21/1997 15:08:15
On 21 Mar 1997, Jon Ribbens writes:

> Anyway, my main question is whether I should use 'dump' or 'tar' to
> backup my data (at least 2GB). I would use 'dump', since it seems
> the obvious choice, but 'restore' seems very unreliable, and there
> is no way I can find to test a 'dump' backup without having a blank
> drive to restore the data onto. ('restore -t' doesn't actually read
> the whole tape.)

I believe that it is possible to something like 

    dd if=/dev/nrst0 of=/dev/null bs=10240

and read the entire dump... the summary output from dd should indicate
any bad blocks encountered?  This is not a file-by-file inode-by-inode
bit-by-bit verify of the dump, but it may catch bad spots in your
tapes??

I confess just I use dump and then restore -t to generate online
dumptocs, and then every couple of weeks I use restore -i to restore a
random file from the latest dump tape (into /tmp) and compare it with
the original file, as a modest reassurance check that restoring is
possible.  Every quarter I (should, at least!) attempt a restore using
a different DAT drive and machine, too, as a further test of backup
usefulness in the event of a real disaster.

> I have had a lot of problems with dump/restore, which up to this point
> I have been using to backup to another hard drive, e.g.:
> 
>   dump -0u -f - / | gzip >/mnt/backup.gz
> 
> When I come to restore the data, I've had various random-seeming
> errors, to do with restore wanting to 'change volumes' (huh?) and then
> coredumping (before setting the permissions on all the directories,
> aargh). 'dump' also seems to randomly say 'Signal on pipe: cannot
> recover', and I have to start the whole thing again.

I've never seen that, in many months of using NetBSD dump and restore
(NetBSD/i386 1.1 and now 1.2, on a Pentium and now a Pentium Pro-200).
Admittedly, I only dump to tape, and don't do compression other than
the built in hardware DDS-DC or DDS-2 compression in the tape drive.

The 'signal on pipe' problem is perhaps a consequence of using a pipe
for the dump output file instead of using a more conventional filename
or devicename?

The dump commands in a level zero dump script here are:

    /usr/bin/time /sbin/dump 0ubdsf  64 61000  5000 /dev/nrst0  /
    /usr/bin/time /sbin/dump 0ubdsf  64 61000  5000 /dev/nrst0  /usr
    /usr/bin/time /sbin/dump 0ubdsf  64 61000  5000 /dev/nrst0  /var
    /usr/bin/time /sbin/dump 0ubdsf  64 61000  5000 /dev/nrst0  /usr/local

and so on. The use of time is because we redirect script output and
retain it, it helps sometimes to see how long dumps of each partition
take to plan overnight dump schedules.  The density and length values
were arrived at by trial and error!

    YYMMDD=`/bin/date +%y%m%d`
    export YYMMDD

    for i in 0 1 2 3
    do
      /bin/mt rew
      /bin/mt fsf $i
      restore -t -b 64 -f /dev/nrst0 >dumptoc.0.${YYMMDD}.$i
    done
    /bin/mt rew

then generates the dumptocs for each of the partitions dumped on
that particular tape.

Interactive restore using

    restore -i -b 64 -f /dev/nrst0

works fine for me.

> So, I think maybe I'd like to use 'tar' instead. At least it's
> simple, it can cope if the backup file is truncated, and I trust
> it a lot more. My only question is whether it is suitable for
> this sort of thing - will it cope okay with devices, FIFOs, hardlinks,
> symlinks, files with holes, etc? 

According to the sysadmin bible (Unix System Administration Handbok,
by Nemeth, Snyder, Seebass and Hein), tar does not read or write
device files, expands holes, and is intolerant of tape errors.  I
don't know if the GNU tar reliably overcomes those limitations.

Also generally,tar is unable to handle multi-volume backups, and
definitely (from personal experience) has a hard limit on path length
that is really annoying when you bump into it -- files you expect to
be added to the tar file simply are not added...

I also believe that the act of creating the tar archive will change
last access times of all files read... dump doesn't do that.  Whether
that matters to you depends on whether you ever care how long it is
since a file has been accessed.  It can be useful occasionally to
decide whether some huge file or set of files can be archived off to
tape to save disk space... if they've not been accessed in six months,
they're a good candidate :-)

All of that suggests to me that dump is often the more appropriate
general purpose filesystem backup tool.  As the sysadmin bible says in
Section 11.3:

    The dump and restore commands are the most common way to create and
    restore from backups.  These programs have been part of UNIX for a
    very long time, and their behaviour is well known.  For most
    sites, dump and restore are the backup method of choice.

I'd suggest looking at those dump.core files and see what is going on
on your machine...  dump and restore are old, but reliable and trusted
tools.

> Can I just do
> 
>   tar cl /
> 
> to backup and later 'tar x' to restore? 

Well, at a minimum I would think that you'd want to add the 'p' flag
to the restore, so all permissions info gets restored :-)

> Do I need the 'S' option to handle sparse files 'more efficiently'?

If you have a bunch of sparse files and don't want their holes
expanded, yes.

Like the (relatively few!) other BSD sysadmins I have spoken to, I
trust dump and restore a lot.  I've met bad tape media, and bad tape
drives, but never (yet?)  had a problem relating to a bug or
unreliability in dump or restore, under NetBSD or BSDi -- nor under
Linux or Irix.

The only "problem" I have met with NetBSD dump is that the man page is
incorrect (or at least misleading): it says:

     The following options are supported by dump:

     -0-9 Dump levels.  A level 0, full backup, guarantees the entire
          file system is copied (but see also the -h option below).  A
          level number above 0, incremental backup, tells dump to copy
          all files new or modified since the last dump of the same or
          lower level.

The last phrase suggests (to me) that doing a level N dump followed
immediately by a second level N dump will result in no files being
dumped at all in the second of the pair.  That isn't how it works
(I've tested it!).  That isn't how the Irix man page says it works,
not how the sysadmin bile says it works.  The minimal fix is to remove
the words "the same or" from the NetBSD 1.2/i386 dump(8) manpage.

I don't know if that has been fixed in NetBSD-current.  It probably
ought to be :-)

Jonathan
--
Jonathan Marsden   | Internet: jonathan@xc.org  | Making electronic 
1849 N. Wabash Ave.| Phone: +1 (909) 794 1151   | communications work 
Redlands, CA 92374 | FAX:   +1 (909) 794 3016   | reliably for Christian 
USA                | http://www.xc.org/jonathan | missions worldwide