NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-xen/55207: netbsd domU does not migrate properly from one xen host to another



>Number:         55207
>Category:       port-xen
>Synopsis:       netbsd domU does not migrate properly from one xen host to another
>Confidential:   no
>Severity:       critical
>Priority:       low
>Responsible:    port-xen-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 25 17:55:00 +0000 2020
>Originator:     Pierre-Philipp Braun
>Release:        netbsd-7,netbsd-8,netbsd-9,HEAD
>Organization:
Innopolis University
>Environment:
NetBSD netbsdffs 9.99.59 NetBSD 9.99.59 (XEN3_DOMU) #0: Sat Apr 25 19:38:36 MSK 2020  root@netbsdffs:/usr/objdir/sys/arch/amd64/compile/XEN3_DOMU amd64
>Description:
XEN farm is 4.11 but that would also happen with latest 4.13 release
dom0s here are Slackware Linux 14.2 with kernel 5.5.5

migrating from a xen host to another using `xl migrate`, it first looks good

migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x3/0x0/993)
Loading new save file <incoming migration stream> (new xl fmt info 0x3/0x0/993)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Saving domain 11, type x86 PV
xc: info: Found x86 PV domain from Xen 4.11
xc: info: Restoring domain

and then comes

libxl: error: libxl_dom_suspend.c:262:domain_suspend_common_pvcontrol_suspending: Domain 11:guest didn't acknowledge suspend, cancelling request
xc: error: Bad mfn for suspend record: Internal error
xc: error: mfn 0x2, max 0x2030000: Internal error
xc: error:   m2p[0x2] = 0xffffffffffffffff, max_pfn 0x7ffff: Internal error
xc: error: Save failed (34 = Numerical result out of range): Internal error
libxl: error: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 11:saving domain: domain did not respond to suspend request: Numerical result out of range
migration sender: libxl_domain_suspend failed (rc=-8)
xc: error: Failed to read Record Header from stream (0 = Success): Internal error
xc: error: Restore failed (0 = Success): Internal error
libxl: error: libxl_stream_read.c:850:libxl__xc_domain_restore_done: restoring domain: Success
libxl: error: libxl_create.c:1266:domcreate_rebuild_done: Domain 2:cannot (re-)build domain: -3
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain 2:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain 2:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain 2:Destruction of domain failed
migration target: Domain creation failed (code -3).
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration transport process [10506] exited with error status 1
Migration failed, failed to suspend at sender.

...which crashes the guest

guest console does not show anything after

[ 124.2702597] xenbus_shutdown_handler: xenbus_rm 13
[ 124.4904546] Flushing disk caches: done

but while doing previous migration tests with netbsd dom0 v9.0 I could see more

[ 106.6704667] xenbus_shutdown_handler: xenbus_rm 13
[ 106.7003299] Flushing disk caches: 7 done
[ 106.7103358] fatal page fault in supervisor mode
[ 106.7103358] trap type 6 code 0 rip 0xffffffff8020313f cs 0xe030 rflags 0x10256 cr2 0
0xffffd180790ffcc8
[ 106.7103358] curlwp 0xffffd18003a378e0 pid 0.7 lowest kstack 0xffffd180790fc2c0
[ 106.7103358] panic: trap
[ 106.7103358] cpu0: Begin traceback...
[ 106.7103358] vpanic() at netbsd:vpanic+0x143
[ 106.7103358] snprintf() at netbsd:snprintf
[ 106.7103358] startlwp() at netbsd:startlwp
[ 106.7103358] alltraps() at netbsd:alltraps+0xae
[ 106.7103358] sleepq_block() at netbsd:sleepq_block+0x19a
[ 106.7103358] cv_wait() at netbsd:cv_wait+0x9e

>How-To-Repeat:
Get a XEN farm up and running with at least two nodes.  Shared storage and network is not even necessary to reproduce the bug.  Tested with GNU/Linux dom0s here but might be same problem with NetBSD dom0s.
>Fix:
It's there since 2015 at least - apparently it's not an easy fix.
https://mail-index.netbsd.org/port-xen/2015/01/18/msg008440.html



Home | Main Index | Thread Index | Old Index