Subject: Re: Backup to Tape
To: NetBSD User's Discussion List <netbsd-users@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: netbsd-users
Date: 06/19/2003 16:17:19
[ On Thursday, June 19, 2003 at 11:18:38 (-0700), collver1@attbi.com wrote: ]
> Subject: Re: Backup to Tape
>
> The following figures from pricewatch show DLT and IDE media to be
> comparable in price.  My guess is that the DLT media will be more reliable,
> depending on the drive used.
> 
> media	US$	gigs	cost/gig	source	notes
> ------	------	------	--------------	------	-----
> DLT IV $35.45	40	$0.88		1	Sony brand
> IDE	$44.00	40	$1.10		2	90 day warranty
> IDE	$55.00	60	$0.91		3	90 day warranty
> IDE	$67.00	80	$0.83		4	30 day warranty
> IDE	$86.00	100	$0.86		5	8 mo. warranty
> IDE	$92.00	120	$0.76		6	30 day warranty
> IDE	$120.00	160	$0.75		7	30 day warranty
> 
> 1: http://www.pcnation.com/asp/details.asp?affid=303&item=450997
> 2: http://www.compdisk.com/index.cfm?fuseaction=Products.ProdDetail&PartNumber=4D040H2pw&ID=35
> 3: http://www.compdisk.com/index.cfm?fuseaction=Products.ProdDetail&PartNumber=WD600ABpw&ID=35
> 4: http://www.memorylabs.net/orsa8054rpm2.html
> 5: http://www.upgrade-solution.com/detail.cfm?show=yes&PID=721&add=yes
> 6: http://www.memorylabs.net/orsaspsp1254.html
> 7: http://www.memorylabs.net/saei1654hd2m.html

That's an excellent, but incomplete, summary.  It skims over one vital
issue and fails entirely to show at least one very important issue
(which is further complicated by the first).

The first is the size issue.  Note the discrepancy between DL-IV media
vs. the lowest cost/GB ATA drives.  There are several implications and
alternatives here.  Obviously the first issue is that you've practically
(though not strictly) got to keep your filesystems down below 40 GB if
you go with DLT IV for backups.  How this impacts a site is perhaps too
difficult to generalize.  If you've got more than 40GB of data to back
up then there are two obvious alternatives:  use a tape robot of some
sort; or use a bigger tape, e.g. SDLT.  According to Pricewatch SDLT
media is still roughly the same price per GB as IDE/ATA "media", though
it caps out at a max of 160GB raw.

However the big kicker is the price of the [S]DLT tape drive.  Yes you
only buy one of those (until it fails), however they're still very
expensive, especially compared to the infrastructure necessary to
support ATA drives.

Perhaps even more important is the fact that ATA drives are still
keeping well ahead of tape media in raw capacity per unit and in
transfer speeds.  Think about the relative transfer rates of having one
DLT or SDLT drive in a tape library vs. having three or four
simultanously (and much faster) writing ATA drives.  Even if you do
decide to allow your backup data to be compressed and thus have to burn
a wee bit of CPU to make up for the compression normally done by the
[S]DLT drive, you still don't have to pay any more (assuming any even
half-modern CPU and especially not if you only do the same 2:1
compression a drive can do).

Even if you increase the cost of your ATA drives by adding to the price
of each the cost of a small foam-filled, water-resistant, aluminum
carrying case, and a "hot-swap" bay, you still end up with more capacity
for lower cost/GB and with almost no up-front cost (certainly not so
steep as a new SDLT drive and controller!).

Now think about this just a wee bit harder:  we're equating ATA random
access high-speed disk drives with tape media!  Think of the
implications!

For one you can now have full live online backups as you should easily
be able to have enough raw backup capacity to match your online storage
capacity.  This means your backup "format" can be a regular filesystem,
thus making even your single-file restores almost trivial and capable of
being done directly and securely by the end users!  You can even improve
the performance of your network backups by using something like rsync
and/or unison.  No more level-1 and greater partaial backups to contend
with!

For another you can use RAID-1 (or RAID-0+1) with just three sets of
drives and two sets of hot-swap bays and a pair of carrying cases that
each hold a whole set of drives.  Just split the mirror(s) to take a
backup set off-site, rebuilding the mirror when you return the already
off-site set.  You can expand your backup capacity incrementally just by
adding another three drives.  This means you have 100% full online
backups and 100% full off-site backups, with 100% redundancy as well!

With a RAID mirror scheme you can also practically rotate your off-site
copy as fast as you can rebuild the mirror, assuming you can still do
the actual backups at the same time (i.e. that you don't need the full
write speed of the drives just to do the backups themselves).  You can
use the added capacity to just keep all your archive data online all of
the time and forever (remember we're proably only talking about sites
that don't already have very impressive backup systems in place right
now).  Even if ATA drives don't last quite as long as the tape media
might, it just doesn't matter -- factor in the cost of occasional
replacement drives, and occasional upgrades to higher capacity drives
(to make big leaps in the amount of archive data that can be kept
online), and you will still end up with a lower _total_ cost backup
strategy.

If you really want to create archives to be kept off-line for long-term
historical purposes then write your precious archive data to CD-ROM (or
maybe DVD if you're daring and/or have a lot of data to archive) and
just hope they don't degrade over time (or make a pair or more and store
each copy in a separate location).

(and if you really want to keep multiple generations of backup copies of
individual files then use a tool that does that online, such as RCS)

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>