Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

issues with NetBSD/XEN live migrations



Hello netbsd/xen

I am maintaining a farm of 4 XEN hosts running Linux dom0s:

HARDWARE
ubuntu1: 	Product Name: IBM 3850 M2 / x3950 M2 -[71414RG]-
slack2 : 	Product Name: IBM 3850 M2 / x3950 M2 -[71414RG]-
slack3 : 	Product Name: IBM 3850 M2 / x3950 M2 -[71414RG]-
slack4 : 	Product Name: IBM 3850 M2 / x3950 M2 -[71414RG]-

XEN VERSIONS
ubuntu1: (XEN) Xen version 4.9.0 (Ubuntu 4.9.0-0ubuntu3) (stefan.bader%canonical.com@localhost) 
slack2 : (XEN) Xen version 4.11-rc (root@example.local) 
slack3 : (XEN) Xen version 4.11-rc (root@example.local) 
slack4 : (XEN) Xen version 4.11-rc (root@example.local) 

LINUX VERSIONS
ubuntu1: 4.13.0-39-generic
slack2 : 4.17.1.slackxen
slack3 : 4.17.1.slackxen
slack4 : 4.4.88.xen

XEN GUESTS
snegw: NetBSD 7.1.2 (XEN3_DOMU.201803151611Z) using Blktap2 tap:qcow2 against a virtual disk
netbsdffs8: NetBSD 8.0_RC1 (XEN3_DOMU.201804191727Z) using Blktap2 tap:tapdisk:aio against a virtual ffs disk/partition 

GUEST MIGRATION TESTS
- migrating snegw from slack3 to slack4 succeeds but prints an error
- migrating snegw from slack4 to slack3 fails
- migrating netbsdffs8 from slack3 to slack4 fails
- migrating netbsdffs8 from slack4 to slack3 fails

## migrating snegw from slack3 to slack4 succeeds but prints an error

root@slack3:~# xl migrate snegw slack4
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x3/0x0/1381)
Loading new save file <incoming migration stream> (new xl fmt info 0x3/0x0/1381)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Saving domain 4, type x86 PV
xc: info: Found x86 PV domain from Xen 4.11
xc: info: Restoring domain
libxl: error: libxl_dom_suspend.c:262:domain_suspend_common_pvcontrol_suspending: Domain 4:guest didn't acknowledge suspend, cancelling request
xc: info: Restore successful
xc: info: XenStore: mfn 0x1761ff, dom 0, evt 1
xc: info: Console: mfn 0x1761fe, dom 0, evt 2
migration target: Transfer complete, requesting permission to start domain.
migration sender: Target has acknowledged transfer.
migration sender: Giving target permission to start.
migration target: Got permission, starting domain.
migration target: Domain started successsfully.
migration sender: Target reports successful startup.
Migration successful.

the guest console shows:

xenbus_shutdown_handler: xenbus_rm 13
Flushing disk caches: 13 done

and the guest is alive just fine on its new XEN host.

## migrating snegw from slack4 to slack3 fails

slack4# xl migrate snegw slack3
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x3/0x0/1381)
Loading new save file <incoming migration stream> (new xl fmt info 0x3/0x0/1381)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Saving domain 7, type x86 PV
xc: info: Found x86 PV domain from Xen 4.11
xc: info: Restoring domain
libxl: error: libxl_dom_suspend.c:262:domain_suspend_common_pvcontrol_suspending: Domain 7:guest didn't acknowledge suspend, cancelling request
xc: error: Domain has not been suspended: shutdown 0, reason 255: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: error: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 7:saving domain: domain did not respond to suspend request: Success
migration sender: libxl_domain_suspend failed (rc=-8)
xc: error: Failed to read Record Header from stream (0 = Success): Internal error
xc: error: Restore failed (0 = Success): Internal error
libxl: error: libxl_stream_read.c:850:libxl__xc_domain_restore_done: restoring domain: Success
libxl: error: libxl_create.c:1266:domcreate_rebuild_done: Domain 8:cannot (re-)build domain: -3
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain 8:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain 8:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain 8:Destruction of domain failed
migration target: Domain creation failed (code -3).
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration transport process [3527] exited with error status 1
Migration failed, failed to suspend at sender.

guest console shows:

xenbus_shutdown_handler: xenbus_rm 13
Flushing disk caches: 14 done

and the guest seems frozen.

## migrating netbsdffs8 from slack3 to slack4 fails

slack3# xl migrate netbsdffs8 slack4
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x3/0x0/1364)
Loading new save file <incoming migration stream> (new xl fmt info 0x3/0x0/1364)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Saving domain 6, type x86 PV
xc: info: Found x86 PV domain from Xen 4.11
xc: info: Restoring domain
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: error: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 6:saving domain: domain responded to suspend request: Success
migration sender: libxl_domain_suspend failed (rc=-3)
xc: error: Failed to read Record Header from stream (0 = Success): Internal error
xc: error: Restore failed (0 = Success): Internal error
libxl: error: libxl_stream_read.c:850:libxl__xc_domain_restore_done: restoring domain: Success
libxl: error: libxl_create.c:1266:domcreate_rebuild_done: Domain 5:cannot (re-)build domain: -3
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain 5:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain 5:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain 5:Destruction of domain failed
migration target: Domain creation failed (code -3).
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration transport process [2914] exited with error status 1
Migration failed, resuming at sender.
libxl: error: libxl_dom.c:38:libxl__domain_type: unable to get domain type for domid=6

the guest console shows:

xenbus_shutdown_handler: xenbus_rm 13
Flushing disk caches: 36 done
fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8020316e cs 0xe030 rflags 0x10256 cr2 0x8 ilevel 0x6 rsp 0xffffa0002b36bcb8
curlwp 0xffffa0000074d020 pid 0.7 lowest kstack 0xffffa0002b3682c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
trap() at netbsd:trap+0x953
--- trap (number 6) ---
DDB lost frame for netbsd:Xresume_xenev6+0x3e, trying 0xffffa0002b36bbc0
Xresume_xenev6() at netbsd:Xresume_xenev6+0x3e
--- interrupt ---
Bad frame pointer: 0xffffa00000975000
10256:
cpu0: End traceback...

dumping to dev 142,1 (offset=0, size=0): not possible
rebooting...

## migrating netbsdffs8 from slack4 to slack3 fails

slack4# xl migrate netbsdffs8 slack3
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x3/0x0/1364)
Loading new save file <incoming migration stream> (new xl fmt info 0x3/0x0/1364)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Saving domain 3, type x86 PV
xc: info: Found x86 PV domain from Xen 4.11
xc: info: Restoring domain
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: error: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 3:saving domain: domain responded to suspend request: Success
migration sender: libxl_domain_suspend failed (rc=-3)
xc: error: Failed to read Record Header from stream (0 = Success): Internal error
xc: error: Restore failed (0 = Success): Internal error
libxl: error: libxl_stream_read.c:850:libxl__xc_domain_restore_done: restoring domain: Success
libxl: error: libxl_create.c:1266:domcreate_rebuild_done: Domain 5:cannot (re-)build domain: -3
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain 5:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain 5:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain 5:Destruction of domain failed
migration target: Domain creation failed (code -3).
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration transport process [2236] exited with error status 1
Migration failed, resuming at sender.
libxl: error: libxl_dom.c:38:libxl__domain_type: unable to get domain type for domid=3

the guest console shows:

xenbus_shutdown_handler: xenbus_rm 13
Flushing disk caches: 10 done
fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8020316e cs 0xe030 rflags 0x10256 cr2 0x8 ilevel 0x6 rsp 0xffffa0002b36bcb8
curlwp 0xffffa0000074d020 pid 0.7 lowest kstack 0xffffa0002b3682c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
trap() at netbsd:trap+0x953
--- trap (number 6) ---
DDB lost frame for netbsd:Xresume_xenev6+0x3e, trying 0xffffa0002b36bbc0
Xresume_xenev6() at netbsd:Xresume_xenev6+0x3e
--- interrupt ---
Bad frame pointer: 0xffffa00000975000
10256:
cpu0: End traceback...

dumping to dev 142,1 (offset=0, size=0): not possible
rebooting...


I remember I already had troubles before, migrating xen netbsd guests when running 4.10.0 hosts and identical dom0s.  What is going wrong, are those known issues?  Can we not use netbsd/xen in production?


Home | Main Index | Thread Index | Old Index