NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: LTO support



tlaronde%polynum.com@localhost writes:

>> I am using bup for this, and others use borgbackup.  Surely there are
>> others.
>
> Thanks! for the pointers to bup and borgbackup. I was precisely looking
> for a deduplication storage or backup facility (in some sense, when you
> can have text files, cvs diff like/ed scripts are a diff and history
> facility; with binary data, you are out of luck).

You are not at all out of luck with binary.  bup does this and I think
borg does too.

The basic idea is a rolling hash, similar to rsync, where bytes are
accumulated until some number of least significant bits are either all 0
(or all 1, I forget, doesn't matter).  bup uses 13 bits by default and I
use 16.  Then those bytes from start to the byte that has 13 zeros are
put into a blob and stored.  In this way, large binary files that are
mostly the same end up with mostly the same blobs.

And, perhaps obvious, but this scheme leads to 100% deduplication of
whole files that have not changed.  And this is pretty common case.

I know this sounds a bit crazy, and I may not have described it quite
right, but it really works.  On the server that handles my mail -- so a
lot of coming and going -- there might be 12G of backed up data total
and maybe 300M of new blobs every week.  On a machine that just runs a
server without a lot of data changing, it can be far less.

The rolling checksum scheme really matters for VM images, and database
storage.



> FWIW, Plan9 has/had a WORM filesystem: Write Once Read Many, where
> storage was made with deduplication of blocks, meaning one could
> have too history of files, only saving the differences. Furthermore,
> in such a system, an attack from ransomware would be useless: data
> is never changed once written, just a new version added; this
> protects from blunder deletions or malignity. Unfortunately, this
> part of Plan9 did not find its way in the Unix world the same as other
> bits of it did...

To protect against ransomware there needs to NOT be an administrative
interface to clean up old versions.  And for long-term usability you do
need such an interface.

I don't really purge bup backups.  Instead I just get new, bigger disks
every few years and start over and set the old ones aside.

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index