Subject: keeping hash database in sync or very light-weight redundant database
To: None <netbsd-help@netbsd.org>
From: Jeremy C. Reed <reed@reedmedia.net>
List: netbsd-help
Date: 02/21/2007 13:02:35
I want to keep a near identical database on five or more (maybe up to 25) 
different servers.

Currently the database is in hash(3) format. Currently it has around 3000 
entries (around 600KB of disk space), but will grow to maybe over 100,000 
entries.

The database is looked at once per minute per server. (The data is also 
loaded by something else that may use the data thousands of times per 
minute.)

Any suggestions on how I can easily share this data?

Note the data entries have an expiration time.

Some ideas I have:

1) Create a log of the local additions and deletions from the database for 
every system. Then every minute, have every system copy that to each of 
the other servers. Then they read that log to do the additions and 
deletions. But systems not available (even temporarily) will get out of 
sync for additions. As for deletions, the expiration time will work 
automatically (unless there was a manual deletion).

2) Somehow merge all the databases on every available system. But if one 
system adds an entry but another does a deletion of same entry, then it 
won't be consistent.

3) On every local database addition or deletion, also send the details via 
some UDP broadcast. Have a listener on all the systems that: verifies 
received data and then does the addition or deletion to its own database 
respectively. (I can use packet filter to make sure no other access to 
submit to this; or I can do this over TCP with SSH or SSL tunnel). But 
again the data will get out of sync for systems that are unavailable.

4) Maybe I need to use a more advanced version of Berkeley DB (or move 
away from it). I see db4 has Distributed Transactions and replication 
groups. I even found example db4 code for network-based master and clients 
with election priorities and clients can become masters. I don't know 
anything about this. But a system using elections to choose the master 
database server seems like another way to do this.

I do not want to have a central database server. Every individual system 
must be self-contained -- and can not expect other servers to be 
available.

I do not want to use a heavy SQL server.

I'd prefer not to use DNS to store my data. (But if you can convince me 
that could be easiest let me know.)

  Jeremy C. Reed