NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: LTO support



Le Thu, Aug 12, 2021 at 08:30:01AM -0400, Greg Troxel a écrit :
> 
> Pouya Tafti <pouya+lists.netbsd%nohup.io@localhost> writes:
> 
> > I'm looking for a low cost offsite backup solution for my teeny local
> > NAS (couple of TiB of redundant ZFS RAIDZ2 on /amd64 9.2_STABLE) for
> > disaster recovery.  Seemingly affordable LTO-5 drives (~EUR 250; sans
> > libraries) pop up on eBay from time to time, and I thought I might
> > start mailing tape backups to friends and family.  Being rather
> > clueless about tape, I was wondering:
> 
> I used to a be a tape fan, because in large volumes it was cheaper and
> because people who said disk was better than tape (20 years) ago did not
> have good answers for "do you send a copy of your bits offsite every
> other week at least?" and some did not have good answers for "do you
> have copies of your bits that are locally offline?".
> 
> All that said, I don't think tape is a bad idea even now.  But I think
> it's the path less traveled.
> 
> <from memory, take with grain of salt>
> 
> I used to run amanda, which runs various backup programs, typically dump
> on BSD and tar on Linux (where the culture is that dump doesn' work, and
> that might be correct).   amanda then splits the dumps into chunks of a
> configured size, typically 1GB back then, and then puts all the chunks
> onto tapes, perhaps multiple tapes.
> 
> In amanda culture, one basically does "dump | gzip" on the host being
> backed up and then that gets an amanda header and split.
> 
> I started out with DDS2, then DDS3, then LTO-1 and I think LTO-2 (200 GB
> native without compression?)   This was for backing up maybe 20
> machines.    There were two sets of tapes, a daily set that got fulls
> and incremenentals as amanda chose (to fit in a tape and meet frequency
> goals for fulls while getting at least an incremental every run), and
> then an archive set that had runs once a week, fulls only.   For daily,
> tapes were in the drive, and for archive, dumps were manually flushed
> (insert, run amflush, not that manual) on Mondays after weekend dumps.
> Archive takes were sent offsite and the next one that would need to be
> written retrieved.
> 
> This setup gave us the ability to immediately restore recent data from
> onsite tapes in the even of an rm screwup or a disk failure.  And it
> gave the ability to not lose the bits in the event of a total building
> loss.
> 
> Probably amanda can use "zfs send" as a dump program, or could be taugth
> to do so pretty easily.
> </>
> 
> These days, disks are pretty cheap, but they probably cost more to mail,
> and mailing them often isn't really good for them.  Think about the cost
> of a 4T external USB drive vs tapes that hold 4T.
> 
> Also consider why you are doing backups.  Most people should be doing
> backups for multiple reasons:
> 
>   - accidental rm
>   - scrambled fs due to OS bug, perhaps provoked by power failure
>   - scrambled fs due to hardware issues (I've seen bad RAM lead to
>     corrupted files)
>   - disk failure
>   - whole computer failure due to lightning, power surge, etc.
> 
>   - cyber attack (ransomware or just generic compromise)
> 
>   - loss of entire building, or more precisely loss of primary disk and
>     local backups at the same time
> 
> 
> Generally backups should be offline, in that once made they are powered
> off and disconnected.  This helps with the first and second groups.  And
> some should be someplace else, to address the third.
> 
> Another strategy is:
> 
>   use disks (multiple) and write to them periodically.  store locally,
>   unplugged.
> 
>   use another set of disks and write to them periodically but take them
>   offsite (and hence unplugged)
> 
>   have disks at remote sites, e.g. plugged into a RPI or apu2, and
>   additionally do backups to those disks over the net
> 
> This gives various protections at varying time scales; it's entirely
> feasible to push backups over the net once a week and take physical
> disks offsite somewhat less often.
> 
> Once you start thinking like this, you probably want to look at backup
> systems that are able to do deduplication so that you can do backups in
> a way where
> 
>   - every backup is logically a full backup
>   - every backup only writes data that is not already in the archive
> 
> This helps with disk space, with being able to get state from long ago
> (when you realize something went wrong long after the fault), and also
> greatly reduces the data transfer requirements (once you have done the
> first backup).
> 
> I am using bup for this, and others use borgbackup.  Surely there are
> others.

Thanks! for the pointers to bup and borgbackup. I was precisely looking
for a deduplication storage or backup facility (in some sense, when you
can have text files, cvs diff like/ed scripts are a diff and history
facility; with binary data, you are out of luck).

FWIW, Plan9 has/had a WORM filesystem: Write Once Read Many, where
storage was made with deduplication of blocks, meaning one could
have too history of files, only saving the differences. Furthermore,
in such a system, an attack from ransomware would be useless: data
is never changed once written, just a new version added; this
protects from blunder deletions or malignity. Unfortunately, this
part of Plan9 did not find its way in the Unix world the same as other
bits of it did...

Thank you again for the pointers!
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
                       http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index