Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Xen/Xentools 3.3 Domain-Unnamed



On Tuesday 19 August 2008 10:30:00 bsd-xen%roguewrt.org@localhost wrote:
> Christoph Egger wrote:
> > On Friday 15 August 2008 12:00:51 bsd-xen%roguewrt.org@localhost wrote:
> >> Christoph Egger wrote:
> >>> On Friday 15 August 2008 08:55:55 Sarton O'Brien wrote:
> >>>> When shutting down or rebooting a domu I'm left with:
> >>>>
> >>>> Domain-Unnamed                               1   467     1     ---s--
> >>>> 46.0
> >>>>
> >>>> In 'xm list' ... but not in 'xm top' (it displays for little while in
> >>>> 'xm top' then disappears). If rebooting, the domu will not start up by
> >>>> itself.
> >>>>
> >>>> I can't kill it and when doing multiple domu reboots, only one ever
> >>>> exists.
> >>>
> >>> Everytime I try to destroy it, I see it freed some memory. So repeating
> >>> xm destroy<domain>  ; xm list  kills it finally.
> >>>
> >>> There's something asynchronous within the hypervisor which obviously
> >>> needs to be debugged.
> >>>
> >>> This happens for both PV and HVM guests.
> >
> > I tracked this issue down. The root cause is a discrepancy in the
> > error *value* codes between AT&T Unix Version 6 and Unix System V.
> >
> > Linux, Xen and Solaris use the Unix System V error codes.
> > *BSD uses the AT&T Unix Version 6 error codes.
> >
> > After shutting down (or rebooting) a domU, the guest container gets
> > destroyed. This implies freeing resources used by the guest (RAM,
> > internal management structures, etc.).
> >
> > The destroy process is an asynchronous process in order to not block the
> > Dom0 (and other DomUs).
> >
> > The destroy process works this way:
> >
> > The XEN_DOMCTL_destroydomain is invoked from the xentools (python, libxc
> > code).
> > In the hypervisor:
> > XEN_DOMCTL_destroydomain hypercall calls domain_kill().
> > domain_kill() calls domain_relinquish_resources().
> > domain_relinquish_resources() calls relinquish_memory().
> > relinquish_memory() calls hypercall_preempt_check().
> >
> > hypercall_preempt_check() makes all this asynchronous.
> > It fails, if there's an other hypercall pending.
> > In that case relinquish_memory() returns EAGAIN, which
> > means, just retry to continue the destroy process.
> >
> > EAGAIN is passed through the return path back into the pyton code
> >  (= userspace). The python code checks for EAGAIN and *should*
> > retry, but it doesn't.
> >
> >
> > In Unix System V, EAGAIN has the error code value 11.
> > In AT&T Unix Version 6, EDEADLK has the error code value 11.
> >
> > Remember I said, Xen uses Unix System V error code values, while
> > *BSD uses AT&T Unix Version 6 error code values.
> >
> > This means, Xen returning EAGAIN means for the python code
> > EDEADLK. This leads to the confusing
> > "domain destroy failed due to 'Resource deadlock avoided'"
> > error message.
> >
> > I informed XenSource about this to find a solution.
>
> Nice catch ... and thanks for all the information! Greatly appreciated.

This is what I got from XenSource:

--------------------------------------------------------------------------------------
You'll need to do errno remapping at the bottom of libxenctrl/libxenguest,
or within your equivalent of Linux's privcmd kernel driver.

Either is easy -- even in libxc there's one shared macro or function that
provides access to the actual hypercall interface, I'm pretty sure.
--------------------------------------------------------------------------------------

I first tried to fix this in the xentools as suggested by XenSource.
But then I noticed this had an impact to *all* Dom0 privcmd ioctl's,
which made things worse.

Reason: Only IOCTL_PRIVCMD_HYPERCALL is a passthrough from
userspace into the hypervisor and back. All others also perform
hypercalls into the hypervisor, but the return code from them
is never passed back into userspace.

Therefore, the easiest fix (and with minimal effort) is to do
the error code remapping in sys/arch/xen/xen/privcmd.c after the hypercall
returned in IOCTL_PRIVCMD_HYPERCALL.

Attached patch is doing this and works fine.
With it, there are no longer any phantom domains named
"Domain-Unnamed" left.

Christoph
Index: privcmd.c
===================================================================
RCS file: /cvsroot/src/sys/arch/xen/xen/privcmd.c,v
retrieving revision 1.27
diff -u -p -r1.27 privcmd.c
--- privcmd.c   18 Aug 2008 23:09:37 -0000      1.27
+++ privcmd.c   19 Aug 2008 11:39:39 -0000
@@ -77,6 +77,23 @@ static int privpgop_fault(struct uvm_fau
                         int, int, vm_prot_t, int);
 static int privcmd_map_obj(struct vm_map *, vaddr_t, paddr_t *, int, int);
 
+
+static int
+privcmd_xen2bsd_errno(int error)
+{
+       /* Xen uses System V error codes.
+        * In order to keep bloat as minimal as possible,
+        * only convert what really impact us. */
+       switch (-error) {
+       case EDEADLK:
+               return EAGAIN;
+       case EAGAIN:
+               return EDEADLK;
+       default:
+               return -error;
+       }
+}
+
 static int
 privcmd_ioctl(void *v)
 {
@@ -150,11 +167,11 @@ privcmd_ioctl(void *v)
                                error = 0;
                        } else {
                                /* error occured, return the errno */
-                               error = -error;
+                               error = privcmd_xen2bsd_errno(error);
                                hc->retval = 0;
                        }
                } else {
-                       error = -error;
+                       error = privcmd_xen2bsd_errno(error);
                }
                break;
        }


Home | Main Index | Thread Index | Old Index