Subject: Re: Question regarding remote data synchronization
To: J. Buck Caldwell <netbsd3@bitparts.org>
From: Jesper Louis Andersen <jlouis@mongers.org>
List: netbsd-users
Date: 01/08/2004 17:55:09
Quoting J. Buck Caldwell (netbsd3@bitparts.org):
> My company is looking at replacing all of our NetWare servers over time 
> with Samba. We have about 24 branches, each with a file server, and some 
> of the data on the servers is duplicated. 

We are doing a more local thing, but the setup is the same. We sync a lot
of boxes to the same server by using rsync. At day we hardlink the data to
a new backup directory. The next days rsync then mirrors to this new directory.

Cleanup is done by a script watching the diskusage of the backup disk. When-
ever the disk is more than 90% used the backups we do not want is weeded out,
oldest first until the usage is below 90%. A second check ensures that we do
always have 30 days worth of backup and warns us if a delete would have 
deleted such that we get below this magic number.

You can recurse the above into a server sync'ing weekly and a again monthly.
Off-site backup is done my mirroring the disks to a) Firewire disks and b)
tapes. This operation is somewhat manual.

The solution is a blend of sh and perl. I would have chosen Ruby or Python
if I was to build a solution these days since it is much more maintainable
in the long run. You can probably hack the whole mess up and test it one man
in 2 weeks time.

Things our setup doesnt take care of yet: Atomicity. File system snapshots
should help a lot here.

Also, to guess at the diskspace you need find out how much the data changes
in a day, a week and a month. At our place it is surprisingly low. About 15%
a month.

-- 
j.