Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Migration - Xen 4.3



Le 19/01/2015 13:09, Stephen Borrill a écrit :
On Sun, 18 Jan 2015, Brian Marcotte wrote:
Has anyone tried Migration recently?

I'm getting this on Xen 4.3, Linux dom0:

 # xl migrate mydomU otherdom0host
 [...]
libxl: error: libxl_dom.c:1063:libxl__domain_suspend_common_callback: \
 guest didn't acknowledge suspend, cancelling request

Curious, the ack is done when HYPERVISOR_suspend() gets called by domY (see link just below).

On the domU (netbsd) console, I see this:

 xenbus_shutdown_handler: xenbus_rm 13
 Flushing disk caches: 21 done

The domain is completely locked up and doesn't even respond to "+++++".

Before xenbus handlers get called, do you see any of the DPRINTK(...) messages found on this page:

http://nxr.netbsd.org/xref/src/sys/arch/xen/xen/xen_machdep.c#xen_prepare_suspend

It is similar with "xm migrate".

migration is merely a suspend => snapshot (done by hypervisor/xl/xend) => resume on the other. So if any of "suspend" or "resume" does not work, so does migrate.


I have never got migration to succeed, nor suspend/resume (which is
fundamentally the same operation). Sometimes suspend appears to work,
but resume fails.

The usual symptoms for me are that the domain just continues
regardless (i.e. it ignores the suspend request) rather than it locks.
However, from a dom0 point of view, it waits for confirmation of
suspension and so blocks any further operations on it.

I understand that the problem is reasonably well understood by the
relevant developers, but that debugging is hard.

Now that's a regression, suspend is supposed to finish correctly (e.g. the domain is descheduled and you get a core file that represents VM's state). Resuming is the culprit and I failed to get $SPARETIME to investigate the situation in the last few months; usual symptoms is a corrupted Xen I/O ring that completely thrashes xbdback(4) and leads to a paniced domU (or a very, very slow dom0 like an interrupt storm).

Do you try suspend/resume on UP or MP domU? Using xl or xm?

--
Jean-Yves Migeon


Home | Main Index | Thread Index | Old Index