Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: shared disks in domU

On Tue, 8 Sep 2009 18:17:12 +0400
Victor Gamov <vitspec%gmail.com@localhost> wrote:

> 2009/9/8 der Mouse <mouse%rodents-montreal.org@localhost>:
> >>> Can anybody explain me how I can create shared RW virtual disk in
> >>> two or more domU?
> >> What you're doing won't work, and will likely corrupt the file
> >> system and/or panic one or both of the machines.
> I see
> >> The only way I know to do what you want is to use something like
> >> NFS.
> Yes. but I'll try to find other solution.
> > Actually, Victor got exactly what he asked for, a shared RW virtual
> > disk.  What he didn't get was what he wanted, a shared RW
> > filesystem. (The same issues would arise with any other form of
> > direct sharing of the same piece of disk, from two initiators on a
> > SCSI chain to sharing an iSCSI target to dual-ported hardware.  As
> > you point out, in the current state of filesystems, the only option
> > is to interpose a piece of software, such as an NFS server, as an
> > aggregator between the hosts and the disk - and make the hosts
> > aware of it, as NFS client code is.)
> iSCSI is good point to start
No, it isn't.  As der Mouse and I have pointed out, getting a
read-write *disk* is easy, but you want a read-write file system.  I
know of none on NetBSD; in fact, I know of none on any Unix-like
system, though Linux has so many file systems it's quite conceivable
there is one for it.

Conceptually, it's not an impossible thing to do -- IBM had it on
OS/360 on its mainframes 30-35 years ago -- but doing it requires
careful attention to locking and some mechanism (either at the disk
drive level or via an out-of-band communications channel) to lock
certain resources to a particular system.  You also have to pay
attention to things like the write cache in the drive hardware, if
you're working with physical drives.

Suppose, for example, that (in a UFS-like file system) you want to
allocate an inode.  You have to lock that portion of the inode list.
(Earlier, you may have to lock something that tells you what portion of
the inode list to look at.) You read the disk block containing it from
the physical disk -- you can't trust your cache, since the previous
write to that block came from the other machine.  You then find a free
inode, allocate it by setting certain fields to certain values, and
write it back to the disk.  You can't release the lock until you know
that the authoritative data store has the value, which is not the same
time as you get a "write complete" signal from the hardware -- it might
be cached.  In a Xen environment, the issue is Xen's management of the
blocks.  That is, suppose that two different domUs have the same file
allocated as a disk.  Does the dom0 know that block 17 of the first
domU's file is the same as block 17 of the second's?  Conceptually, it
could tell, but given the rarity of this setup, I would doubt that it
actually does it.  (The reason all this could work on IBM mainframes
has to do with the way disks were managed.  Anyone interested in a
tutorial on this is welcome to contact me off-list...)

True story.  On a Xen box I run, I use one domU to build all of the
packages for the others.  Normally, only the build machine has pkgsrc
mounted.  My mode of operation is to build packages, unmount the pkgsrc
file system, and mount it read-only on the others.  I then do binary
package installation them, and then unmount it.  Once, I forgot the
unmount step, and fired a new build on the usual machine for such
things.  One of the others panicked overnight -- the 'find' command in
/etc/daily ran into an inconsistency in the on-disk metadata, so the
system panicked.  Note that this was a machine that had a read-only
mount; the file system itself wasn't being corrupted.

In short, it's conceptually possible, but it's definitely a non-trivial

                --Steve Bellovin, http://www.cs.columbia.edu/~smb

Home | Main Index | Thread Index | Old Index