Subject: Re: Backup to Tape
To: NetBSD User's Discussion List <netbsd-users@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: netbsd-users
Date: 06/19/2003 18:11:49
[ On Thursday, June 19, 2003 at 13:59:12 (-0700), collver1@attbi.com wrote: ]
> Subject: Re: Backup to Tape
>
> Not so.  I regularly back up filesystems that are hundreds of gigabytes in
> size.  You simply need to use backup software that can have backup objects
> span multiple tapes, which is commonplace.

Yes, even dump can do it, BUT.... there are many caveats....

> You can also find higher capacity tapes, like LTO.

You can trivially create quite massive-capacity RAID-0+1 filesystems
with a decent sized set of modern high-capacity drives.  Also remember
that 500GB ATA drives are very nearly on the doorstep and terrabyte
single drives are probably just around the corner (if some form of
"solid state" storage doesn't arrive first).  It'll take some incredible
advances in tape technology to keep up with even single-drive capacity
(never mind access speed), let alone also keeping the reliability as
high as it is for even DLT-IV.  (I won't believe the SDLT reliability
numbers until there's some real-world experience to back them up!  :-)

> LTO2 provides 20MB/second native, which is about the write speed you get in
> the real world from ATA133.

Actually you really mean it's about 1/2 the speed of ATA-133 if you use
a drive for sequential writes as you would use a tape (apples to apples!):

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
NBSD-ATA 3000 41865 34.7 40128 13.2 10464  2.8 40413 34.4 40832  6.0  93.9  0.5
FBSD-ATA 3000 45998 33.5 44012 10.1 22631  7.9 42423 34.4 45041  6.2 152.1  0.4

(and that's from last quarter's mass-produced generic ATA drive!)

> Typically when you are restoring data, you restore a chunk, for example a
> whole filesystem or directory tree.

Yes, but very often that's so the user requesting the restore can
extract one or two files from within it.

Regardless if you have full on-line backups, including full on-line
archives in filesytems, restoring any amount of data, from one file to a
whole system's worth of files, is just as easy as copying the desired
file(s) to wherever you want them using the exact same tools you'd use
if they were kept in your $HOME.

>  It is common for backup software to be
> able to seek to a spot on the tape, and restore the data very quickly.

Not anywhere near as fast as you can find it on a live filesystem, even
if you ignore the time it takes to physically find and load the tape,
and even if you ignore the time it takes to schedule access to your one
or very few very expensive tape drives that are primarily scheduled to
be writing backups.

The advantages of having full online copies of all your data should not
be overlooked -- they eliminate many of the hidden and not-so-hidden
costs of using "serial" off-line backup media, not the least of which
are the complications of dealing with backup media management issues.

Even if you use your drives to store compressed archive images of
backups from remote systems the reduction in management headaches (media
and device scheduling, for example) is quite significant.

If you can't keep everything online then creating off-line archive
copies of normally un-needed old data on whatever media has the
appropriate capacity and shelf life characteristics is a far better
approach than forcing your whole backup strategy to deal with the
limitations imposed by such media.

I don't know how far up this idea scales with current technology, but I
suspect it could go quite far.  I have personally loaded almost all of
my own archive data onto live filesystems and I still haven't managed to
come anywhere near filling a couple of very modest 10GB and 20GB
filesystems.  I couldn't even begin to conceive of what I'd do with even
just 100GB unless I started downloading music or video, or archiving
photos online or something, though I suppose I could keep multiple
generations of even un-compressed backups online, but even then I don't
know what I'd do with it since I already use change tracking tools like
RCS and SCCS to keep multiple generations of those of my files where it
makes sense to do that.

Where things really start to get interesting is when you consider using
something like the Elephant file system, perhaps with extensions to
allow for migration from really high-speed devices to slower-speed
devices (so that your SAN/NAS and "backup" storage are all seamlessly
integrated) and then combining that with mirroring so that not only is
all of your data available in ready-to-use form off-site, but also _all_
of the change history for all your data is available both online and
with only something like 24-hours missing in the off-site copy.

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>