Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Xen balloon driver rewrite



Hi list,

So, in an attempt to add most of the missing stuff in current Xen balloon driver, I ended up rewriting most of the logic behind. It's not yet finished, but really closed to it (FYI, I am attaching a patch). Only the workqueue part needs to be done, it is ~ one/two hours of coding, then testing. The balloon will be enabled by default for -6.

The old design used a specific thread to queue balloon operations and handle inflating/deflating. The "new" driver will rather be workqueue(9) based, as it simplifies the locking and handling of errors from ballooning, especially for error feedback from balloon_thread. I will now simply log an error and terminate the worker.

The sysctl tree kern.xen.balloon has 4 nodes (values in KiB):
- mem-max: the mem-max value associated to the domain, obtained from XenStore. - mem-min: a safeguard value, so a domain refuses to balloon memory below this mark (protective measure) - mem-target: a target for balloon's memory. May not be reached, especially if it it tries to go below mem-min value.
- mem-current: the current memory reservation of the domain.

Note that balloon is always an operation that requires domain's cooperation.

All values are in KiB. From a user perspective, I am wondering if the values should be given either in bytes, or pages:
- the Xen hypercalls use pages, of PAGE_SIZE size.
- the XenStore stores some of these values in KiB.

I tend to be against values in pages, because they are often obtained from values in bytes, converted through PAGE_SHIFT shifts manually. Also, I assume that someday, maybe, sysctl(8) will be able to use dehumanize_number(). Has anyone an opinion on this matter? My current code uses uint64_t for sysctl KiB values, so they can be converted to B safely. Using just pages would turn the code to size_t everywhere though.

--
Jean-Yves Migeon
jeanyves.migeon%free.fr@localhost
Index: sys/arch/xen/xen/balloon.c
===================================================================
RCS file: /cvsroot/src/sys/arch/xen/xen/balloon.c,v
retrieving revision 1.6
diff -u -p -r1.6 balloon.c
--- sys/arch/xen/xen/balloon.c  12 Nov 2010 13:18:59 -0000      1.6
+++ sys/arch/xen/xen/balloon.c  6 Apr 2011 08:35:52 -0000
@@ -31,29 +31,36 @@
  */
 
 /*
- * The Xen balloon driver enables growing and shrinking PV
- * domains on the fly, by allocating and freeing memory directly.
- */
-
-#define BALLOONDEBUG 1
-
-/*
- * sysctl TODOs:
- * xen.balloon
- * xen.balloon.current: DONE
- * xen.balloon.target: DONE
- * xen.balloon.low-balloon: In Progress
- * xen.balloon.high-balloon: In Progress
- * xen.balloon.limit: XXX
+ * The Xen balloon driver enables growing and shrinking PV domains
+ * memory on the fly, by allocating and freeing memory pages directly.
+ * This management needs domain cooperation to work properly, especially
+ * during balloon_inflate() operation, where a domain gives back memory to
+ * the hypervisor.
+ *
+ * Shrinking memory on a live system is a difficult task, and may render
+ * it unstable or lead to crash. The driver takes a conservative approach
+ * there by doing memory operations in smal steps of a few MiB each time. It
+ * will also refuse to decrease reservation below a certain threshold
+ * (XEN_RESERVATION_MIN), so as to avoid a complete kernel memory exhaustion.
  *
- * sysctl labels = { 'current'      : 'Current allocation',
- *           'target'       : 'Requested target',
- *           'low-balloon'  : 'Low-mem balloon',
- *           'high-balloon' : 'High-mem balloon',
- *           'limit'        : 'Xen hard limit' }
+ * XXX The balloon driver does not currently "plug" new pages into uvm(9)
+ * when more memory is available than at boot time. So ballooning above
+ * physmem is rather useless.
  *
+ * The user can intervene at two different levels to manage the ballooning
+ * of a domain:
+ * - directly within the domain, using a sysctl(9) interface.
+ * - through the Xentools, by modifying the memory/target entry associated
+ *   to a domain. This is usually done in dom0.
+ *
+ * Both sysctl(9) nodes and memory/target entry assume that the values passed
+ * to them are in KiB. Internally, the driver will convert this value in
+ * pages (assuming a page is PAGE_SIZE bytes), and issue the correct hypercalls
+ * to decrease/increase domain's reservation accordingly.
  */
 
+#define BALLOONDEBUG 1
+
 #include <sys/cdefs.h>
 __KERNEL_RCSID(0, "$NetBSD: balloon.c,v 1.6 2010/11/12 13:18:59 uebayasi Exp 
$");
 
@@ -78,47 +85,50 @@ __KERNEL_RCSID(0, "$NetBSD: balloon.c,v 
 
 #define BALLOONINTERVALMS 100 /* milliseconds */
 
-#define BALLOON_DELTA 1024 /* The maximum increments allowed in a
+#define BALLOON_DELTA 256  /* The maximum increments allowed in a
                            * single call of balloon_inflate() or
-                           * balloon_deflate
+                           * balloon_deflate()
                            */
 #define BALLOON_RETRIES 4  /* Number of time every (in|de)flate of
                            * BALLOON_DELTA or less, occurs
                            */
 
-/* XXX: fix limits */
-#define BALLOON_BALLAST 256 /* In pages */
+/*
+ * Safeguard value. Refuse to go below this threshold, so that domain
+ * can keep some free pages for its own use. Value is arbitrary, and may
+ * evolve with time.
+ */
+#define BALLOON_BALLAST 256 /* In pages - 1MiB */
 #define XEN_RESERVATION_MIN (uvmexp.freemin + BALLOON_BALLAST) /* In pages */
-#define XEN_RESERVATION_MAX nkmempages /* In pages */
 
 /* KB <-> PAGEs */
-#define BALLOON_PAGES_TO_KB(_pg) (_pg * PAGE_SIZE / 1024)
-#define BALLOON_KB_TO_PAGES(_kb) (_kb * 1024 / PAGE_SIZE)
-#define BALLOON_PAGE_FLOOR(_kb) (_kb & PAGE_MASK)
+#define PAGE_SIZE_KB (PAGE_SIZE >> 10) /* page size in KB */
+#define BALLOON_PAGES_TO_KB(_pg) ((uint64_t)_pg * PAGE_SIZE_KB)
+#define BALLOON_KB_TO_PAGES(_kb) (roundup(_kb, PAGE_SIZE_KB) / PAGE_SIZE_KB)
 
 /* Forward declaration */
 static void xenbus_balloon_watcher(struct xenbus_watch *, const char **,
                                   unsigned int);
 
+/*
+ * A balloon page entry. Needed to track pages put/reclaimed from balloon
+ */
 struct balloon_page_entry {
        struct vm_page *pg;
        SLIST_ENTRY(balloon_page_entry) entry;
 };
 
 static struct balloon_conf {
-       kmutex_t flaglock; /* Protects condvar (below) */
-       kcondvar_t cv_memchanged; /* Notifier flag for target (below) */
-
-       kmutex_t tgtlock; /* Spin lock, protects .target, below */
-       size_t target; /* Target VM reservation size, in pages. */
+       kmutex_t balloon_mtx; /* Mutex, protects condvar and target (below) */
+       kcondvar_t balloon_cv; /* Condvar variable for target (below) */
+       size_t balloon_target; /* Target domain reservation size, in pages. */
 
-       /* The following are not protected by above locks */
+       /* Linked list of pages used by balloon */
        SLIST_HEAD(, balloon_page_entry) balloon_page_entries;
        size_t balloon_num_page_entries;
 
-       /* Balloon limits */
-       size_t xen_res_min;
-       size_t xen_res_max;
+       /* Minimum amount of memory reserved by domain, in KiB */
+       uint64_t balloon_res_min;
 } balloon_conf;
 
 static struct xenbus_watch xenbus_balloon_watch = {
@@ -126,17 +136,13 @@ static struct xenbus_watch xenbus_balloo
        .xbw_callback = xenbus_balloon_watcher,
 };
 
-static uint64_t sysctl_current;
-static uint64_t sysctl_target;
-
 /* List of MFNs for inflating/deflating balloon */
-static xen_pfn_t *mfn_lista;
+static xen_pfn_t *mfn_list;
 
 /* Returns zero, on error */
 static size_t
 xenmem_get_maxreservation(void)
 {
-#if 0   /* XXX: Fix this call */
        int s, ret;
 
        s = splvm();
@@ -151,12 +157,9 @@ xenmem_get_maxreservation(void)
        }
 
        return ret;
-#else
-       return nkmempages;
-#endif
 }
 
-/* Returns zero, on error */
+/* Returns current reservation, in pages */
 static size_t
 xenmem_get_currentreservation(void)
 {
@@ -170,25 +173,13 @@ xenmem_get_currentreservation(void)
        if (ret < 0) {
                panic("Could not obtain hypervisor current "
                    "reservation for VM\n");
-               return 0;
        }
 
        return ret;
 }
 
-/* 
- * The target value is managed in 3 variables:
- * a) Incoming xenbus copy, maintained by the hypervisor.
- * b) sysctl_target: This is an incoming target value via the
- *    sysctl(9) interface.
- * c) balloon_conf.target
- *    This is the canonical current target that the driver tries to
- *    attain.
- *
- */
-
-
-static size_t
+/* Get value (in KiB) of memory/target in XenStore for current domain */
+static uint64_t
 xenbus_balloon_read_target(void)
 {
        unsigned long long new_target;
@@ -198,16 +189,13 @@ xenbus_balloon_read_target(void)
                return 0;
        }
 
-       /* Returned in KB */
-
        return new_target;
 }
 
+/* Set memory/target value (in KiB) in XenStore for current domain */
 static void
 xenbus_balloon_write_target(unsigned long long new_target)
 {
-
-       /* new_target is in KB */
        if (0 != xenbus_printf(NULL, "memory", "target", "%llu", new_target)) {
                printf("error, couldn't write xenbus target node\n");
        }
@@ -215,57 +203,31 @@ xenbus_balloon_write_target(unsigned lon
        return;
 }
 
-static size_t
-balloon_get_target(void)
-{
-       size_t target;
-
-       mutex_spin_enter(&balloon_conf.tgtlock);
-       target = balloon_conf.target;
-       mutex_spin_exit(&balloon_conf.tgtlock);
-
-       return target;
-
-}
-
-static void
-balloon_set_target(size_t target)
-{
-
-       mutex_spin_enter(&balloon_conf.tgtlock);
-       balloon_conf.target = target;
-       mutex_spin_exit(&balloon_conf.tgtlock);
-
-       return;
-
-}
-
 /*
  * This is the special case where, due to the driver not reaching
  * current balloon_conf.target, a new value is internally calculated
  * and fed back to both the sysctl and the xenbus interfaces,
  * described above.
  */
+#if 0
 static void
 balloon_feedback_target(size_t newtarget)
 {
        /* Notify XenStore. */
        xenbus_balloon_write_target(BALLOON_PAGES_TO_KB(newtarget));
        /* Update sysctl value XXX: Locking ? */
-       sysctl_target = BALLOON_PAGES_TO_KB(newtarget);
+       sysctl_target = newtarget;
 
        /* Finally update our private copy */
-       balloon_set_target(newtarget);
-}
-
-
-/* Number of pages currently used up by balloon */
-static size_t
-balloon_reserve(void)
-{
-       return balloon_conf.balloon_num_page_entries;
+       //balloon_set_target(newtarget);
 }
+#endif
 
+/*
+ * Reserve @npages pages of domain's memory. For each reserved page, add
+ * it to the list of MFNs that will be passed as argument to hypervisor
+ * memory operation
+ */
 static size_t
 reserve_pages(size_t npages, xen_pfn_t *mfn_list)
 {
@@ -279,10 +241,13 @@ reserve_pages(size_t npages, xen_pfn_t *
 
        for (rpages = 0; rpages < npages; rpages++) {
                
-               pg = uvm_pagealloc(NULL, 0, NULL,
-                                  UVM_PGA_ZERO);
+               pg = uvm_pagealloc(NULL, 0, NULL, UVM_PGA_ZERO);
+               if (pg == NULL)
+                       break;
 
-               if (pg == NULL) {
+               bpg_entry = kmem_alloc(sizeof *bpg_entry, KM_SLEEP);
+               if (bpg_entry == NULL) {
+                       uvm_pagefree(pg);
                        break;
                }
 
@@ -294,24 +259,12 @@ reserve_pages(size_t npages, xen_pfn_t *
 
                /* Invalidate pg */
                xpmap_phys_to_machine_mapping[
-                       (pa - XPMAP_OFFSET) >>  PAGE_SHIFT
+                       (pa - XPMAP_OFFSET) >> PAGE_SHIFT
                        ] = INVALID_P2M_ENTRY;
 
                splx(s);
 
-               /* Save mfn */
-               /* 
-                * XXX: We don't keep a copy, but just save a pointer
-                * to the uvm pg handle. Is this ok ?
-                */
-
-               bpg_entry = kmem_alloc(sizeof *bpg_entry, KM_SLEEP);
-
-               if (bpg_entry == NULL) {
-                       uvm_pagefree(pg);
-                       break;
-               }
-
+               /* Save MFN */
                bpg_entry->pg = pg;
 
                SLIST_INSERT_HEAD(&balloon_conf.balloon_page_entries, 
@@ -322,24 +275,28 @@ reserve_pages(size_t npages, xen_pfn_t *
        return rpages;
 }
 
+/*
+ * Reclaim @npages pages from domain's balloon. For each reclaimed page,
+ * remove it from the list of reserved pages, and give them back to
+ * uvm(9).
+ */
 static size_t
-unreserve_pages(size_t ret, xen_pfn_t *mfn_list)
+unreserve_pages(size_t npages, xen_pfn_t *mfn_list)
 {
-
        int s;
-       size_t npages;
+       size_t rpages;
        paddr_t pa;
        struct vm_page *pg;
        struct balloon_page_entry *bpg_entry;
                
-       for (npages = 0; npages < ret; npages++) {
+       for (rpages = 0; rpages < npages; rpages++) {
 
                if (SLIST_EMPTY(&balloon_conf.balloon_page_entries)) {
                        /*
                         * XXX: This is the case where extra "hot-plug"
                         * mem w.r.t boot comes in 
                         */
-                       printf("Balloon is empty. can't be collapsed further!");
+                       printf("Balloon empty. Cannot be collapsed further!\n");
                        break;
                }
 
@@ -351,33 +308,38 @@ unreserve_pages(size_t ret, xen_pfn_t *m
 
                kmem_free(bpg_entry, sizeof *bpg_entry);
 
-               s = splvm();
-
                /* Update P->M */
                pa = VM_PAGE_TO_PHYS(pg);
+
+               s = splvm();
+
                xpmap_phys_to_machine_mapping[
-                   (pa - XPMAP_OFFSET) >> PAGE_SHIFT] = mfn_list[npages];
+                   (pa - XPMAP_OFFSET) >> PAGE_SHIFT] = mfn_list[rpages];
 
                xpq_queue_machphys_update(
-                   ((paddr_t) (mfn_list[npages])) << PAGE_SHIFT, pa);
+                   ((paddr_t) (mfn_list[rpages])) << PAGE_SHIFT, pa);
 
-               xpq_flush_queue();
+               splx(s);
 
                /* Free it to UVM */
                uvm_pagefree(pg);
-
-               splx(s);
        }
 
-       return npages;
+       xpq_flush_queue();
+
+       return rpages;
 }
 
+/*
+ * Inflate balloon of @tpages pages. Pages are moved out of domain's memory
+ * to domain's balloon.
+ */
 static void
 balloon_inflate(size_t tpages)
 {
 
-       int s, ret;
-       size_t npages, respgcnt;
+       int i, s, ret;
+       size_t respgcnt;
 
        struct xen_memory_reservation reservation = {
                .address_bits = 0,
@@ -385,39 +347,24 @@ balloon_inflate(size_t tpages)
                .domid        = DOMID_SELF
        };
 
-
-       npages = xenmem_get_currentreservation();
-       KASSERT (npages > tpages);
-       npages -= tpages;
-
-
-       KASSERT(npages > 0);
-       KASSERT(npages <= BALLOON_DELTA);
-
-       memset(mfn_lista, 0, BALLOON_DELTA * sizeof *mfn_lista);
-
-       /* 
-        * There's a risk that npages might overflow ret. 
-        * Do this is smaller steps then.
-        * See: HYPERVISOR_memory_op(...) below....
+       /*
+        * Perform ballooning by increments of BALLOON_DELTA pages.
+        * This will put less pressure on the memory subsystem.
         */
+       for (i = 0; i < tpages / BALLOON_DELTA; i++) {
+               memset(mfn_list, 0, BALLOON_DELTA * sizeof(*mfn_list));
+               respgcnt = reserve_pages(tpages, mfn_list);
+
+               /* Hand over pages to Hypervisor */
+               xenguest_handle(reservation.extent_start) = mfn_list;
+               reservation.nr_extents = respgcnt;
 
-       if (npages > XEN_RESERVATION_MAX) {
-               return;
-       }
-
-       respgcnt = reserve_pages(npages, mfn_lista);
+               s = splvm();
+               ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
+                                          &reservation);
+               splx(s);
 
-       if (respgcnt == 0) {
-               return;
        }
-       /* Hand over pages to Hypervisor */
-       xenguest_handle(reservation.extent_start) = mfn_lista;
-       reservation.nr_extents = respgcnt;
-
-       s = splvm();
-       ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
-       splx(s);
 
        if (ret > 0 && ret != respgcnt) {
 #if BALLOONDEBUG
@@ -426,7 +373,7 @@ balloon_inflate(size_t tpages)
                /* Unroll loop and release page frames back to the OS. */
                KASSERT(respgcnt > ret);
                if ((respgcnt - ret) !=
-                   unreserve_pages(respgcnt - ret, mfn_lista + ret)) {
+                   unreserve_pages(respgcnt - ret, mfn_list + ret)) {
                        panic("Could not unreserve balloon pages in "
                            "inflate incomplete path!");
                }
@@ -440,6 +387,9 @@ balloon_inflate(size_t tpages)
        return;
 }
 
+/*
+ * Deflate balloon of @tpages pages. Pages are given back to domain's memory.
+ */
 static void
 balloon_deflate(size_t tpages)
 {
@@ -453,7 +403,6 @@ balloon_deflate(size_t tpages)
                .domid        = DOMID_SELF
        };
 
-
        /* 
         * Trim npages, if it has exceeded the hard limit 
         */
@@ -479,9 +428,9 @@ balloon_deflate(size_t tpages)
        KASSERT(npages > 0);
        KASSERT(npages <= BALLOON_DELTA);
        
-       memset(mfn_lista, 0, BALLOON_DELTA * sizeof *mfn_lista);
+       memset(mfn_list, 0, BALLOON_DELTA * sizeof *mfn_list);
 
-       if (npages > XEN_RESERVATION_MAX) {
+       if (npages > balloon_conf.balloon_res_max) {
                return;
        }
 
@@ -497,17 +446,13 @@ balloon_deflate(size_t tpages)
           
        if (npages > balloon_reserve()) {
                npages = balloon_reserve();
-
 #if BALLOONDEBUG
                printf("\"hot-plug\" memory unsupported - clipping "
                    "reservation to %zd pages.\n", pgcur + npages);
 #endif
-               if (!npages) { /* Nothing to do */
-                       return;
-               }
        }
 
-       xenguest_handle(reservation.extent_start) = mfn_lista;
+       xenguest_handle(reservation.extent_start) = mfn_list;
        reservation.nr_extents = npages;
 
        s = splvm();
@@ -521,7 +466,7 @@ balloon_deflate(size_t tpages)
                return;
        }
 
-       npages = unreserve_pages(ret, mfn_lista);
+       npages = unreserve_pages(ret, mfn_list);
 
 #if BALLOONDEBUG
        printf("deflated by %zu\n", npages);
@@ -532,161 +477,129 @@ balloon_deflate(size_t tpages)
 }
 
 /*
- * Synchronous call that resizes reservation
+ * The balloon thread is responsible for managing balloon of the current
+ * domain. It can inflate/deflate it according to value of @target
+ * found in the balloon_conf structure.
  */
 static void
-balloon_resize(size_t targetpages)
+balloon_thread(void *ignore)
 {
 
-       size_t currentpages;
-
-       /* Get current number of pages */
-       currentpages = xenmem_get_currentreservation();
-
-       KASSERT(currentpages > 0);
+       int pollticks;
+       size_t currentpages, target;
+       xen_pfn_t *mfn_list;
 
-       if (targetpages == currentpages) {
+       /* Allocate list of MFNs for inflating/deflating balloon */
+       mfn_list = kmem_alloc(BALLOON_DELTA * sizeof(*mfn_list), KM_NOSLEEP);
+       if (mfn_list == NULL) {
+               aprint_error("%s: could not allocate mfn_list\n", __func__);
                return;
        }
 
-#if BALLOONDEBUG
-       printf("Current pages == %zu\n", currentpages);
-#endif
-
-       /* Increase or decrease, accordingly */
-       if (targetpages > currentpages) {
-               balloon_deflate(targetpages);
-       } else {
-               balloon_inflate(targetpages);
-       }
-
-       return;
-}
-
-static void
-balloon_thread(void *ignore)
-{
-
-       int i = 0, deltachunk = 0, pollticks;
-       size_t current, tgtcache;
-       ssize_t delta = 0; /* The balloon increment size */
-
-       pollticks = mstohz(BALLOONINTERVALMS);
-
-       /* 
-        * Get target. This will ensure that the wait loop (below)
-        * won't break out until the target is set properly for the
-        * first time. The value of targetinprogress is probably
-        * rubbish.
-        */
-
        for/*ever*/ (;;) {
 
-               mutex_enter(&balloon_conf.flaglock);
+               /* Set (or reset) timer */
+               pollticks = mstohz(BALLOONINTERVALMS);
 
-               while (!(delta = balloon_get_target() - 
-                        (current = xenmem_get_currentreservation()))) {
+               /* Monitor change of the target number of balloon pages */
+               for (;;) {
+                       currentpages = xenmem_get_currentreservation();
+
+                       mutex_tryenter(&balloon_conf.balloon_mtx);
+                       target = balloon_conf.balloon_target;
+                       if (currentpages != target) {
+                               /* there's some work to do */
+                               mutex_exit(&balloon_conf.balloon_mtx);
+                               break;
+                       }
 
-                       if (EWOULDBLOCK == 
-                           cv_timedwait(&balloon_conf.cv_memchanged,
-                                        &balloon_conf.flaglock, 
-                                        pollticks)) {
+                       /* no need for change -- wait for a signal */
+                       if (cv_timedwait(&balloon_conf.balloon_cv,
+                           &balloon_conf.balloon_mtx, 
+                           pollticks) == EWOULDBLOCK) {
                                /*
                                 * Get a bit more lethargic. Rollover
                                 * is ok.
                                 */
                                pollticks += mstohz(BALLOONINTERVALMS);
-
-                       } else { /* activity! Poll fast! */
-                               pollticks = mstohz(BALLOONINTERVALMS);
                        }
                }
 
-               KASSERT(delta <= INT_MAX && delta >= INT_MIN); /* int abs(int); 
*/
-               KASSERT(abs(delta) < XEN_RESERVATION_MAX);
-
-               if (delta >= 0) {
-                        deltachunk = MIN(BALLOON_DELTA, delta);
-                } else {
-                        deltachunk = MAX(-BALLOON_DELTA, delta);
-                }
-
-               tgtcache = current + deltachunk;
-
-               if (deltachunk && i >= BALLOON_RETRIES) {
-                       tgtcache = xenmem_get_currentreservation();
-                       balloon_feedback_target(tgtcache);
-                       if (i > BALLOON_RETRIES) {
-                               /* Perhaps the "feedback" failed ? */
-                               panic("Multiple Balloon retry resets.\n");
-                       }
-
-#if BALLOONDEBUG
-                       printf("Aborted new target at %d tries\n", i);
-                       printf("Fed back new target value %zu\n", tgtcache);
-                       printf("delta == %zd\n", delta);
-                       printf("deltachunk == %d\n", deltachunk);
-#endif                 
-
-               } else {
-
+               /* Alright, now there is a new target to set */
 #if BALLOONDEBUG
-                       printf("new target ==> %zu\n", tgtcache);
+               printf("%s: new target: %zu\n", __func__, target);
 #endif
-                       balloon_resize(tgtcache);
-               }
-
-               current = xenmem_get_currentreservation();
 
-               /* 
-                * Every deltachunk gets a fresh set of
-                * BALLOON_RETRIES
+               /*
+                * We assume that xenbus_balloon_watcher() and
+                * sysctl(9) handlers checked the sanity of the
+                * new target value, so now we inflate/deflate balloon
+                * accordingly.
                 */
-               i = (current != tgtcache) ? i + 1 : 0; 
+               /* XXX: need error handling */
+               /* Increase or decrease, accordingly */
+               printf("%s: ", __func__);
+               if (target > currentpages) {
+                       printf("deflate: %zd\n", target - currentpages);
+                       //balloon_deflate(target - currentpages);
+               } else {
+                       printf("inflate: %zd\n", currentpages - target);
+                       //balloon_inflate(currentpages - target);
+               }
 
-               mutex_exit(&balloon_conf.flaglock);
+               /* XXX JYM remove, because it should be handled as error case */
+               mutex_enter(&balloon_conf.balloon_mtx);
+               balloon_conf.balloon_target = currentpages;
+               mutex_exit(&balloon_conf.balloon_mtx);
 
        }
-
 }
 
+/*
+ * Handler called when memory/target value changes inside Xenstore.
+ * All sanity checks must happen in this handler, as it is the common
+ * entry point to notify balloon thread.
+ */
 static void
 xenbus_balloon_watcher(struct xenbus_watch *watch, const char **vec,
                       unsigned int len)
 {
-       size_t new_target; /* In KB */
-
-       if (0 == (new_target = (size_t) xenbus_balloon_read_target())) {
-               /* Don't update target value */
+       size_t new_target;
+       uint64_t target_kb  = xenbus_balloon_read_target();
+       uint64_t target_max = BALLOON_PAGES_TO_KB(xenmem_get_maxreservation());
+
+       if (target_kb < balloon_conf.balloon_res_min) {
+               printf("Xen balloon: new_target %"PRIu64" unacceptable "
+                   "(below min: %"PRIu64")\n",
+                   target_kb, balloon_conf.balloon_res_min);
                return;
        }
-
-       new_target = BALLOON_PAGE_FLOOR(new_target);
-
-#if BALLOONDEBUG 
-       if (new_target < BALLOON_KB_TO_PAGES(balloon_conf.xen_res_min) ||
-           new_target > BALLOON_KB_TO_PAGES(balloon_conf.xen_res_max)) {
-               printf("Requested target is unacceptable.\n");
+       if (target_kb > target_max) {
+               /*
+                * Should not happen. Hypervisor should block balloon
+                * requests above mem-max.
+                */
+               printf("Xen balloon: new_target %"PRIu64" unacceptable "
+                   "(above max: %"PRIu64")\n",
+                   target_kb, target_max);
                return;
        }
-#endif
 
-       /* 
-        * balloon_set_target() calls
-        * xenbus_balloon_write_target(). Not sure if this is racy 
-        */
-       balloon_set_target(BALLOON_KB_TO_PAGES(new_target));
+       new_target = BALLOON_KB_TO_PAGES(target_kb);
 
 #if BALLOONDEBUG
-       printf("Setting target to %zu\n", new_target);
-       printf("Current reservation is %zu\n", xenmem_get_currentreservation());
+       printf("%s: current reservation: %zu pages\n",
+           __func__, xenmem_get_currentreservation());
+       printf("%s: new target: %zu pages\n", __func__, new_target);
 #endif
 
-       /* Notify balloon thread, if we can. */
-       if (mutex_tryenter(&balloon_conf.flaglock)) {
-               cv_signal(&balloon_conf.cv_memchanged);
-               mutex_exit(&balloon_conf.flaglock);
+       /* Only wake-up balloon thread if target changes. */
+       mutex_enter(&balloon_conf.balloon_mtx);
+       if (balloon_conf.balloon_target != new_target) {
+               balloon_conf.balloon_target = new_target;
+               cv_signal(&balloon_conf.balloon_cv);
        }
+       mutex_exit(&balloon_conf.balloon_mtx);
        
        return;
 }
@@ -697,55 +610,38 @@ balloon_xenbus_setup(void)
 
        size_t currentpages;
 
-       /* Allocate list of MFNs for inflating/deflating balloon */
-       mfn_lista = kmem_alloc(BALLOON_DELTA * sizeof *mfn_lista, KM_NOSLEEP);
-       if (mfn_lista == NULL) {
-               aprint_error("%s: could not allocate mfn_lista\n", __func__);
-               return;
-       }
-
-       /* Setup flaglocks, condvars et. al */
-       mutex_init(&balloon_conf.flaglock, MUTEX_DEFAULT, IPL_NONE);
-       mutex_init(&balloon_conf.tgtlock, MUTEX_DEFAULT, IPL_HIGH);
-       cv_init(&balloon_conf.cv_memchanged, "balloon");
+       /* Initialize target mutex and condvar */
+       mutex_init(&balloon_conf.balloon_mtx, MUTEX_DEFAULT, IPL_NONE);
+       cv_init(&balloon_conf.balloon_cv, "balloon");
 
        SLIST_INIT(&balloon_conf.balloon_page_entries);
        balloon_conf.balloon_num_page_entries = 0;
 
-       /* Deliberately not-constified for future extensibility */
-       balloon_conf.xen_res_min = XEN_RESERVATION_MIN;
-       balloon_conf.xen_res_max = XEN_RESERVATION_MAX; 
-
-#if BALLOONDEBUG
-       printf("uvmexp.freemin == %d\n", uvmexp.freemin);
-       printf("xen_res_min == %zu\n", balloon_conf.xen_res_min);
-       printf("xen_res_max == %zu\n", balloon_conf.xen_res_max);
-#endif
        /* Get current number of pages */
        currentpages = xenmem_get_currentreservation();
 
        KASSERT(currentpages > 0);
 
-       /* Update initial target value */
-       balloon_set_target(currentpages);
+       /* Update initial target value - no need to lock for initialization */
+       balloon_conf.balloon_target = currentpages;
 
-       /* 
-        * Initialise the sysctl_xxx copies of target and current
-        * as above, because sysctl inits before balloon_xenbus_setup()
-        */
-       sysctl_current = currentpages;
-       sysctl_target = BALLOON_PAGES_TO_KB(currentpages);
+       /* Set the values used by sysctl */
+       balloon_conf.balloon_res_min =
+           BALLOON_PAGES_TO_KB(XEN_RESERVATION_MIN);
+
+#if BALLOONDEBUG
+       printf("balloon current reservation: %"PRIu64"\n",
+           BALLOON_PAGES_TO_KB(currentpages));
+       printf("balloon min reservation: %"PRIu64"\n",
+           balloon_conf.balloon_res_min);
+       printf("balloon max reservation:: %"PRIu64"\n", 
+           BALLOON_PAGES_TO_KB(xenmem_get_maxreservation()));
+#endif
 
        /* Setup xenbus node watch callback */
        if (register_xenbus_watch(&xenbus_balloon_watch)) {
                aprint_error("%s: unable to watch memory/target\n", __func__);
-               cv_destroy(&balloon_conf.cv_memchanged);
-               mutex_destroy(&balloon_conf.tgtlock);
-               mutex_destroy(&balloon_conf.flaglock);
-               kmem_free(mfn_lista, BALLOON_DELTA * sizeof *mfn_lista);
-               mfn_lista = NULL;
-               return;
-
+               goto error;
        }
 
        /* Setup kernel thread to asynchronously (in/de)-flate the balloon */
@@ -753,102 +649,128 @@ balloon_xenbus_setup(void)
                NULL /* arg */, NULL, "balloon")) {
                aprint_error("%s: unable to create balloon thread\n", __func__);
                unregister_xenbus_watch(&xenbus_balloon_watch);
-               cv_destroy(&balloon_conf.cv_memchanged);
-               mutex_destroy(&balloon_conf.tgtlock);
-               mutex_destroy(&balloon_conf.flaglock);
+               goto error;
        }
 
        return;
 
-}
+error:
 
-#if DOM0OPS
+       cv_destroy(&balloon_conf.balloon_cv);
+       mutex_destroy(&balloon_conf.balloon_mtx);
+       return;
+
+}
 
 /* 
  * sysctl(9) stuff 
  */
 
-/* sysctl helper routine */
+/* routine to control the minimum memory reserved for the domain */
 static int
-sysctl_kern_xen_balloon(SYSCTLFN_ARGS)
+sysctl_kern_xen_balloon_min(SYSCTLFN_ARGS)
 {
-
        struct sysctlnode node;
-
-       /* 
-        * Assumes SIZE_T_MAX <= ((uint64_t) -1) see createv() in
-        * SYSCTL_SETUP(...) below
-        */
-
+       u_quad_t newval;
        int error;
-       int64_t node_val;
 
-       KASSERT(rnode != NULL);
        node = *rnode;
+       node.sysctl_data = &newval;
+       newval = *(u_quad_t *)rnode->sysctl_data;
 
-       if (strcmp(node.sysctl_name, "current") == 0) {
-               node_val = BALLOON_PAGES_TO_KB(xenmem_get_currentreservation());
-               node.sysctl_data = &node_val;
-               return sysctl_lookup(SYSCTLFN_CALL(&node));
-#ifndef XEN_BALLOON /* Read only, if balloon is disabled */
-       } else if (strcmp(node.sysctl_name, "target") == 0) {
-               if (newp != NULL || newlen != 0) {
-                       return (EPERM);
-               }
-               node_val = BALLOON_PAGES_TO_KB(xenmem_get_currentreservation());
-               node.sysctl_data = &node_val;
-               error = sysctl_lookup(SYSCTLFN_CALL(&node));
+       error = sysctl_lookup(SYSCTLFN_CALL(&node));
+       if (error || newp == NULL)
                return error;
+
+       /* Safeguard value: refuse to go below. */
+       if (newval < XEN_RESERVATION_MIN) {
+               printf("WARNING: trying to balloon below minimum safe "
+                   "value: %"PRIu64"\n", XEN_RESERVATION_MIN);
+               return EINVAL;
        }
-#else
-       } else if (strcmp(node.sysctl_name, "target") == 0) {
-               node_val = * (int64_t *) rnode->sysctl_data;
-               node_val = BALLOON_PAGE_FLOOR(node_val);
-               node.sysctl_data = &node_val;
-               error = sysctl_lookup(SYSCTLFN_CALL(&node));
-               if (error != 0) {
-                       return error;
-               }
 
-               /* Sanity check new size */
-               if (node_val < BALLOON_PAGES_TO_KB(XEN_RESERVATION_MIN) || 
-                   node_val > BALLOON_PAGES_TO_KB(XEN_RESERVATION_MAX) ) {
-#if BALLOONDEBUG
-                       printf("node_val out of range.\n");
-                       printf("node_val = %"PRIu64"\n", node_val);
-#endif
-                       return EINVAL;
-               }
+       balloon_conf.balloon_res_min = newval;
+       return 0;
+}
 
-#if BALLOONDEBUG
-               printf("node_val = %"PRIu64"\n", node_val);
-#endif
+/* returns the current memory reservation of the domain */
+static int
+sysctl_kern_xen_balloon_max(SYSCTLFN_ARGS)
+{
+       struct sysctlnode node;
+       u_quad_t node_val;
 
-               if (node_val != BALLOON_PAGES_TO_KB(balloon_get_target())) {
-                       * (int64_t *) rnode->sysctl_data = node_val;
+       node = *rnode;
 
-#if BALLOONDEBUG
-                       printf("setting to %" PRIu64"\n", node_val);
-#endif
+       node_val = BALLOON_PAGES_TO_KB(xenmem_get_maxreservation());
+       node.sysctl_data = &node_val;
+       return sysctl_lookup(SYSCTLFN_CALL(&node));
+}
 
-                       balloon_set_target(BALLOON_KB_TO_PAGES(node_val));
+/* returns the current memory reservation of the domain */
+static int
+sysctl_kern_xen_balloon_current(SYSCTLFN_ARGS)
+{
+       struct sysctlnode node;
+       u_quad_t node_val;
 
-                       /* Notify balloon thread, if we can. */
-                       if (mutex_tryenter(&balloon_conf.flaglock)) {
-                               cv_signal(&balloon_conf.cv_memchanged);
-                               mutex_exit(&balloon_conf.flaglock);
-                       }
+       node = *rnode;
 
-                       /* Notify XenStore. */
-                       xenbus_balloon_write_target(node_val);
-               }
+       node_val = BALLOON_PAGES_TO_KB(xenmem_get_currentreservation());
+       node.sysctl_data = &node_val;
+       return sysctl_lookup(SYSCTLFN_CALL(&node));
+}
 
-               return 0;
+#ifdef XEN_BALLOON
+/* returns the target memory reservation of the domain */
+/* XXX query memory/target in Xenstore? */
+static int
+sysctl_kern_xen_balloon_target(SYSCTLFN_ARGS)
+{
+       struct sysctlnode node;
+       u_quad_t newval, maxres;
+       int error;
+
+       node = *rnode;
+       node.sysctl_data = &newval;
+       /* we are just reading the value of balloon_target, no lock needed */
+       newval = BALLOON_PAGES_TO_KB(*(u_quad_t*)rnode->sysctl_data);
+
+       error = sysctl_lookup(SYSCTLFN_CALL(&node));
+       if (newp == NULL || error != 0) {
+               return error;
+       }
+
+       /*
+        * Sanity check new size
+        * We should not balloon below the minimum reservation
+        * set by the domain, nor above the maximum reservation set
+        * by domain controller.
+        * Note: domain is not supposed to receive balloon requests when
+        * they are above maximum reservation, but better be safe than
+        * sorry.
+        */
+       maxres = BALLOON_PAGES_TO_KB(xenmem_get_maxreservation());
+       if (newval < balloon_conf.balloon_res_min ||
+           newval > maxres) {
+#if BALLOONDEBUG
+               printf("Trying to balloon out of bounds: %"PRIu64"\n",
+                   newval);
+               printf("min %"PRIu64", max %"PRIu64"\n",
+                   balloon_conf.balloon_res_min, maxres);
+#endif
+               return EINVAL;
        }
-#endif /* XEN_BALLOON */
 
-       return EINVAL;
+       /*
+        * Write new value inside Xenstore. This will fire the memory/target
+        * watch handler, xenbus_balloon_watcher().
+        */
+       xenbus_balloon_write_target(newval);
+
+       return 0;
 }
+#endif /* XEN_BALLOON */
 
 /* Setup nodes. */
 SYSCTL_SETUP(sysctl_kern_xen_balloon_setup, "sysctl kern.xen.balloon setup")
@@ -880,20 +802,40 @@ SYSCTL_SETUP(sysctl_kern_xen_balloon_set
            CTL_CREATE, CTL_EOL);
 
        sysctl_createv(clog, 0, &node, NULL,
-           CTLFLAG_PERMANENT,
-           CTLTYPE_QUAD, "current",
-           SYSCTL_DESCR("current memory reservation from "
-               "hypervisor, in pages."),
-           sysctl_kern_xen_balloon, 0, &sysctl_current, 0,
+           CTLFLAG_PERMANENT | CTLFLAG_READONLY,
+           CTLTYPE_QUAD, "mem-current",
+           SYSCTL_DESCR("Domain's current memory reservation from "
+               "hypervisor, in KiB."),
+           sysctl_kern_xen_balloon_current, 0,
+           NULL, 0,
            CTL_CREATE, CTL_EOL);
 
+#ifdef XEN_BALLOON
        sysctl_createv(clog, 0, &node, NULL,
            CTLFLAG_PERMANENT | CTLFLAG_READWRITE,
-           CTLTYPE_QUAD, "target",
-           SYSCTL_DESCR("Target memory reservation to adjust "
-               "balloon size to, in pages"),
-           sysctl_kern_xen_balloon, 0, &sysctl_target, 0,
+           CTLTYPE_QUAD, "mem-target",
+           SYSCTL_DESCR("Target memory reservation for domain, in KiB."),
+           sysctl_kern_xen_balloon_target, 0,
+           &balloon_conf.balloon_target, 0,
+           CTL_CREATE, CTL_EOL);
+#endif /* XEN_BALLOON */
+
+       sysctl_createv(clog, 0, &node, NULL,
+           CTLFLAG_PERMANENT | CTLFLAG_READWRITE,
+           CTLTYPE_QUAD, "mem-min",
+           SYSCTL_DESCR("Minimum amount of memory the domain "
+               "reserves, in KiB."),
+           sysctl_kern_xen_balloon_min, 0, 
+           &balloon_conf.balloon_res_min, 0,
            CTL_CREATE, CTL_EOL);
-}
 
-#endif /* DOM0OPS */
+       sysctl_createv(clog, 0, &node, NULL,
+           CTLFLAG_PERMANENT | CTLFLAG_READONLY,
+           CTLTYPE_QUAD, "mem-max",
+           SYSCTL_DESCR("Maximum amount of memory the domain "
+               "can use, in KiB."),
+           sysctl_kern_xen_balloon_max, 0,
+           NULL, 0,
+           CTL_CREATE, CTL_EOL);
+
+}


Home | Main Index | Thread Index | Old Index