Subject: Re: Amanda backups: gtar or dump?
To: None <current-users@netbsd.org>
From: Greg A. Woods <woods@most.weird.com>
List: current-users
Date: 10/25/1998 19:17:01
[ On Mon, October 26, 1998 at 09:12:12 (+1100), Simon J. Gerraty wrote: ]
> Subject: Amanda backups: gtar or dump?
>
> I've just finished setting up amanda here (been on my todo list for
> over a year :-) and am trying to decide whether I should use dump or
> gnu tar and would appreciate opinions of others.

Dump is always preferable and IMNSHO almost any file-by-file backup
system is better than GNU Tar.

However it really depends on what your purpose is....

- if you want quick and safe restores after catastrophic disasters, or
  easy restores of individual files lost by accident, then choose dump.
  Nothing beats dump as a system backup tool, and it's usually faster
  than any file-by-file schme can possibly be.  Restoring files from
  dump archives is quite easy, and Amanda can help with selecting the
  correct tape set, positioning the tape, etc. too.

- if you want portability and long-term archiving of data files, then
  something like tar or cpio (preferrably using pax to generate and read
  wither format) is probably your best choice.  Your media is likely to
  be extinct long before the tar or cpio format are.  I don't know that
  Amanada can make easy use of pax "out-of-the-box" though, except
  perhaps via the tar or cpio command-line emulation.

> I also noted that the example disklist shows a prefernce for
> compressing dumps - but my DAT (Seagate/Archive python) drive does
> compression so I assume its better to let the tape drive do it?
> Just letting the drive do compression would certainly help amanda with
> its tape space estimates.

If you're paranoid about rsh then you should be doubly paranoid about
backup compression.  Unless you can guarantee that the compression
algorithm can deal gracefully with loss of data in the middle of the
file then you'd best avoid it all together.  I think most tape drives
use hardware compression that's similar to the LZ compression most
modems do and it doesn't rely on a dictionary that`s stored within the
stream (i.e. they should be able to skip past corrupted blocks and
continue feeding good data after breakage).  GNU Zip (and Unix Compress)
on the other hand can only recover data up to the point of corruption.
There are higher-performance dictionary based algorithms that can
recover after corruption in the stream, but I don't know much more about
them other than that they can lose more data than just that which is
packed into the corrupted blocks.

The problem with all this, of course, is that unless you can pretend to
compress the archives on the tape host with the exact same algorithm as
the tape drive hardware uses in order to determine exactly what will fit
on the tape, Amanda will be forced to assume that the tape is not
compressing and thus you may as well not enable encryption.

The one big advantage of tape compression that most hardware vendors
push, and especially those pushing 4mm and 8mm helical scan drives, such
as DAT and Exabyte, is that it can speed up the transfer rate.  If you
need the additional speed afforded by compression and either you're
certain it won't affect the recoverabilty of tapes with bad blocks, or
you deem the risk level of corrupted tapes to be acceptable, then you
can turn on the tape drive's compression -- just don't tell Amanda you
did so, and don't feel disappointed when you get tapes that are only 70%
full as a result.

Of course many tape drive vendors are pushing compression is because
they're getting beat over the head by those technologies which have a
higher native transfer rate, such as DLT, MLR, etc.  Why anyone would go
with a technology that has several orders of magnitude more moving parts
*and* a slower native transfer rate beats me.  Yes, price is an
objection, but that's only because the slow ones leverage off the
mass-market for similar video and audio drives and the economics of such
a situation are hard to turn around.  The only reason I have an exabyte
drive is because it was given to me and it's better than what I had
before....  Otherwise you pay for what you get.

> Input appreciated.
> 
> Also, I was a bit reluctant to re-enable rsh on my systems - even
> though it is blocked at the ppp link.  Has anyone hacked amanda to use
> ssh?  Otherwise I might just munge it to use ssl_rcmd().
> 
> --sjg

You'll want to put filters on your gateway to prevent outsiders from
poking at your "amanda" and "amandaidx" services....

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>