NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: "Virtual" RAID1



On Mon, Jun 24, 2019 at 07:45:23AM +0930, Brett Lymn wrote:
> On Sat, Jun 22, 2019 at 03:13:21PM +0200, tlaronde%polynum.com@localhost wrote:
> > 
> > Is there something like that existing? the idea being to combine
> > as much as possible existing facilities and just to insert a simple
> > client/server encapsulating "disk" data at the right place (the 
> > pseudo-device) to make it work.
> > 
> 
> Have you looked at coda?  It is a disconnectable file system that will
> automatically cache files locally and allow modifications while the file
> server is unavailable, when the server is available again it will write
> the changes back to the server.

I have read the presentation of the features (I have a copy of a book on
AFS that I need to read to), so it has features that I'm after but it
lacks also one feature: the reality of disks and the need for
dissymmetry: RAID1 puts an equal burden on both disks meaning that to
serve one data, you stress two disks; whether you put two high end
server's disks---and it is a waste---or youp put two desktop/terminal
disks, and you will spend your time replacing disks praying that they
not both die at the same moment.

I have in mind a simple block upon which one could imagine building
distributed data.

This fondamental and elementary block is a couple of storages (I say
"storage" because it doesn't say these are "disks" neither where they
are).

It is a "couple", because it is dissymmetric: (a,b) != (b,a) (I already
here the politically correct gangs shouting...).

A couple is thus composed: ( (st1, /dev/null), (st2, /dev/null) ).

/dev/null is here because it is the more reliable and vast storage ever:
you can put in it whatever you personnally don't care to keep). It is
here because on both members there can be filters (files you replicate
and files you don't).

The fundamental feature is that in the couple it is Write Always, Read
Perhaps.

The first member writes and reads always (potentially to/from
/dev/null).

The second member writes always but reads only on failure from the first
member.

Of course a write to /dev/null will always succeeds; a read from
/dev/null will always fail. Meaning that depending on the filters
passed, data can be duplicated or not (there can be also temporary
memory files that are not kept in any member).

One can see that distributed file system and disconnection can be
handled with this element: since writes (depending on filters) is always
duplicated, in one member is a "local" storage, there is always (if not
discarded by filter) a local copy of written data on the node. So if the
other component is a remote file service, the disconnected client can
work.

In the same spirit, the first component can have a lot of data, but the
local storage will have only what is written by the node (another
dissymmetry on size).

In case of disaster on some server with "all" the files, the files (if
the disaster is handled in a decent time) are all in the local storages,
spread around (not for ever; in my mind the local storage is limited and
can recycle; but in the case of almost full, the administrator has to be
sure the files are in the backups before deleting/reclaiming space).

For my first application, there is no distributed file system. Such a
couple will be put uphill, on the file server, so that high end
disks serve as the first member (write and read always, without a
filter) while the second member are decent but cheap desktop disks
for backup that write always (filtered; not everything is backup-ed)
and read only exceptionnally on failure.

And the question is where to put this logic (I think on the
pseudo-devices level) so it can be implemented with the minimum of code
and kernel changes, more complicated things (distributed, replicated,
fault tolerant etc.) being developped on this fundamental element, but
in user space.

Best,
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                       http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index