Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Instability with NetBSD Dom0

For a couple of years, I've been using NetBSD as a Dom0 with various
versions of Xen.  Most of the time, I've given DomU systems disk through
plain sparse files on the Dom0 (i.e., not as hard partitions).

I have seen some stability issues (freezes) over the years when the Dom0
needed lots of CPU, e.g., as a result of DomU I/O.  But since my
installations where physically nearby, I could simply power cycle the
systems when this happened.

Now, I decided to deploy a large system which is co-located...

The system has two disks:

* wd0 has Xen and NetBSD plus 200 GB for various DomUs.  These DomUs are
for testing but also for critical services in DMZ, and name servers,
etc.  This is a 240 GB Samsung SM843Tn (an "enterprise" SSD).

* wd1, has FreeBSD raw on the disk (and running as a HVM DomU).  This is
a 480 GB Samsung SV843T (also an "enterprise" SSD).

NetBSD is 7.0 BETA and Xen is 4.5, both from about a week ago.  The
machine is a 6 core Ivy bridge Xeon in a Supermicro system.

All was fine until tonight.  Now I cannot reach the Dom0 nor any DomUs
which live on wd0.  I can still reach the FreeBSD guest, in fact it
seems completely unaffected.

What appeared to have triggered the problem was a write operation to a
20 GB file, a file which is created in the Dom0.  The following
operations were performed (the variable 'reldir' points to a release
files directory):

   qemu-img-xen create -f raw disk.img 20G
   vnconfig -c vnd16a disk.img
   newfs -O2 /dev/vnd16a
   mount -o discard,log,noatime /dev/vnd16a /mnt
   df | grep "$mntpt"
   for i in base.tgz comp.tgz etc.tgz;  do
       tar -x -z -f $reldir/sets/$i -C /mnt
   echo foo

I know the script reached the df command as the output is visible at the
hung terminal.  The 'echo foo' line is never reached.

(I know that /mnt was priorly unused, and that vnd16 was free.)

I have performed perhaps 15 installs with the above operations, but they
all were small (1 GB image).  No problems there.

Please advise me.  I need this system to be very stable.  The system it
is about to replace runs FreeBSD with jails, a solution which I want to
avoid as FreeBSD is moving in a direction I cannot handle.  I really
really would want to run Xen + NetBSD on the new system.

Some specific questions:

(1) Would you agree that using plain files for DomUs tend to lead to
    stability issues?

(2) I have never used NetBSD's LVM, but would that lead to greater
    stability?  Carving up the disk (wd0) using a plain old NetBSD
    disklabel wouldn't work, as I have some 30 DomUs.

(3) Else, how can I place many DomUs on the disk such that the Dom0 is
    not involved?  Carving up the disk with with a DOS MBR into 7
    partitions, and then place a NetBSD label on each?  Irksome, but if
    that's the only way, I'll cope.

(4) Hand on heart, do you think I can make this system stable?  How?

A sensible reaction might be "you run a BETA version of NetBSD and a
BETA integration of Xen 4.5, what did you expect?!".  But as a
mentioned, the problem is not unique to these software versions; I have
seen similar issues with NetBSD 6.1.x + Xen 4.2.y (although not with a
co-located system...).

Please encrypt, key id 0xC8601622

Home | Main Index | Thread Index | Old Index