Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Xen balloon driver rewrite
Hi list,
So, in an attempt to add most of the missing stuff in current Xen
balloon driver, I ended up rewriting most of the logic behind. It's not
yet finished, but really closed to it (FYI, I am attaching a patch).
Only the workqueue part needs to be done, it is ~ one/two hours of
coding, then testing. The balloon will be enabled by default for -6.
The old design used a specific thread to queue balloon operations and
handle inflating/deflating. The "new" driver will rather be workqueue(9)
based, as it simplifies the locking and handling of errors from
ballooning, especially for error feedback from balloon_thread. I will
now simply log an error and terminate the worker.
The sysctl tree kern.xen.balloon has 4 nodes (values in KiB):
- mem-max: the mem-max value associated to the domain, obtained from
XenStore.
- mem-min: a safeguard value, so a domain refuses to balloon memory
below this mark (protective measure)
- mem-target: a target for balloon's memory. May not be reached,
especially if it it tries to go below mem-min value.
- mem-current: the current memory reservation of the domain.
Note that balloon is always an operation that requires domain's
cooperation.
All values are in KiB. From a user perspective, I am wondering if the
values should be given either in bytes, or pages:
- the Xen hypercalls use pages, of PAGE_SIZE size.
- the XenStore stores some of these values in KiB.
I tend to be against values in pages, because they are often obtained
from values in bytes, converted through PAGE_SHIFT shifts manually.
Also, I assume that someday, maybe, sysctl(8) will be able to use
dehumanize_number().
Has anyone an opinion on this matter? My current code uses uint64_t for
sysctl KiB values, so they can be converted to B safely. Using just
pages would turn the code to size_t everywhere though.
--
Jean-Yves Migeon
jeanyves.migeon%free.fr@localhost
Index: sys/arch/xen/xen/balloon.c
===================================================================
RCS file: /cvsroot/src/sys/arch/xen/xen/balloon.c,v
retrieving revision 1.6
diff -u -p -r1.6 balloon.c
--- sys/arch/xen/xen/balloon.c 12 Nov 2010 13:18:59 -0000 1.6
+++ sys/arch/xen/xen/balloon.c 6 Apr 2011 08:35:52 -0000
@@ -31,29 +31,36 @@
*/
/*
- * The Xen balloon driver enables growing and shrinking PV
- * domains on the fly, by allocating and freeing memory directly.
- */
-
-#define BALLOONDEBUG 1
-
-/*
- * sysctl TODOs:
- * xen.balloon
- * xen.balloon.current: DONE
- * xen.balloon.target: DONE
- * xen.balloon.low-balloon: In Progress
- * xen.balloon.high-balloon: In Progress
- * xen.balloon.limit: XXX
+ * The Xen balloon driver enables growing and shrinking PV domains
+ * memory on the fly, by allocating and freeing memory pages directly.
+ * This management needs domain cooperation to work properly, especially
+ * during balloon_inflate() operation, where a domain gives back memory to
+ * the hypervisor.
+ *
+ * Shrinking memory on a live system is a difficult task, and may render
+ * it unstable or lead to crash. The driver takes a conservative approach
+ * there by doing memory operations in smal steps of a few MiB each time. It
+ * will also refuse to decrease reservation below a certain threshold
+ * (XEN_RESERVATION_MIN), so as to avoid a complete kernel memory exhaustion.
*
- * sysctl labels = { 'current' : 'Current allocation',
- * 'target' : 'Requested target',
- * 'low-balloon' : 'Low-mem balloon',
- * 'high-balloon' : 'High-mem balloon',
- * 'limit' : 'Xen hard limit' }
+ * XXX The balloon driver does not currently "plug" new pages into uvm(9)
+ * when more memory is available than at boot time. So ballooning above
+ * physmem is rather useless.
*
+ * The user can intervene at two different levels to manage the ballooning
+ * of a domain:
+ * - directly within the domain, using a sysctl(9) interface.
+ * - through the Xentools, by modifying the memory/target entry associated
+ * to a domain. This is usually done in dom0.
+ *
+ * Both sysctl(9) nodes and memory/target entry assume that the values passed
+ * to them are in KiB. Internally, the driver will convert this value in
+ * pages (assuming a page is PAGE_SIZE bytes), and issue the correct hypercalls
+ * to decrease/increase domain's reservation accordingly.
*/
+#define BALLOONDEBUG 1
+
#include <sys/cdefs.h>
__KERNEL_RCSID(0, "$NetBSD: balloon.c,v 1.6 2010/11/12 13:18:59 uebayasi Exp
$");
@@ -78,47 +85,50 @@ __KERNEL_RCSID(0, "$NetBSD: balloon.c,v
#define BALLOONINTERVALMS 100 /* milliseconds */
-#define BALLOON_DELTA 1024 /* The maximum increments allowed in a
+#define BALLOON_DELTA 256 /* The maximum increments allowed in a
* single call of balloon_inflate() or
- * balloon_deflate
+ * balloon_deflate()
*/
#define BALLOON_RETRIES 4 /* Number of time every (in|de)flate of
* BALLOON_DELTA or less, occurs
*/
-/* XXX: fix limits */
-#define BALLOON_BALLAST 256 /* In pages */
+/*
+ * Safeguard value. Refuse to go below this threshold, so that domain
+ * can keep some free pages for its own use. Value is arbitrary, and may
+ * evolve with time.
+ */
+#define BALLOON_BALLAST 256 /* In pages - 1MiB */
#define XEN_RESERVATION_MIN (uvmexp.freemin + BALLOON_BALLAST) /* In pages */
-#define XEN_RESERVATION_MAX nkmempages /* In pages */
/* KB <-> PAGEs */
-#define BALLOON_PAGES_TO_KB(_pg) (_pg * PAGE_SIZE / 1024)
-#define BALLOON_KB_TO_PAGES(_kb) (_kb * 1024 / PAGE_SIZE)
-#define BALLOON_PAGE_FLOOR(_kb) (_kb & PAGE_MASK)
+#define PAGE_SIZE_KB (PAGE_SIZE >> 10) /* page size in KB */
+#define BALLOON_PAGES_TO_KB(_pg) ((uint64_t)_pg * PAGE_SIZE_KB)
+#define BALLOON_KB_TO_PAGES(_kb) (roundup(_kb, PAGE_SIZE_KB) / PAGE_SIZE_KB)
/* Forward declaration */
static void xenbus_balloon_watcher(struct xenbus_watch *, const char **,
unsigned int);
+/*
+ * A balloon page entry. Needed to track pages put/reclaimed from balloon
+ */
struct balloon_page_entry {
struct vm_page *pg;
SLIST_ENTRY(balloon_page_entry) entry;
};
static struct balloon_conf {
- kmutex_t flaglock; /* Protects condvar (below) */
- kcondvar_t cv_memchanged; /* Notifier flag for target (below) */
-
- kmutex_t tgtlock; /* Spin lock, protects .target, below */
- size_t target; /* Target VM reservation size, in pages. */
+ kmutex_t balloon_mtx; /* Mutex, protects condvar and target (below) */
+ kcondvar_t balloon_cv; /* Condvar variable for target (below) */
+ size_t balloon_target; /* Target domain reservation size, in pages. */
- /* The following are not protected by above locks */
+ /* Linked list of pages used by balloon */
SLIST_HEAD(, balloon_page_entry) balloon_page_entries;
size_t balloon_num_page_entries;
- /* Balloon limits */
- size_t xen_res_min;
- size_t xen_res_max;
+ /* Minimum amount of memory reserved by domain, in KiB */
+ uint64_t balloon_res_min;
} balloon_conf;
static struct xenbus_watch xenbus_balloon_watch = {
@@ -126,17 +136,13 @@ static struct xenbus_watch xenbus_balloo
.xbw_callback = xenbus_balloon_watcher,
};
-static uint64_t sysctl_current;
-static uint64_t sysctl_target;
-
/* List of MFNs for inflating/deflating balloon */
-static xen_pfn_t *mfn_lista;
+static xen_pfn_t *mfn_list;
/* Returns zero, on error */
static size_t
xenmem_get_maxreservation(void)
{
-#if 0 /* XXX: Fix this call */
int s, ret;
s = splvm();
@@ -151,12 +157,9 @@ xenmem_get_maxreservation(void)
}
return ret;
-#else
- return nkmempages;
-#endif
}
-/* Returns zero, on error */
+/* Returns current reservation, in pages */
static size_t
xenmem_get_currentreservation(void)
{
@@ -170,25 +173,13 @@ xenmem_get_currentreservation(void)
if (ret < 0) {
panic("Could not obtain hypervisor current "
"reservation for VM\n");
- return 0;
}
return ret;
}
-/*
- * The target value is managed in 3 variables:
- * a) Incoming xenbus copy, maintained by the hypervisor.
- * b) sysctl_target: This is an incoming target value via the
- * sysctl(9) interface.
- * c) balloon_conf.target
- * This is the canonical current target that the driver tries to
- * attain.
- *
- */
-
-
-static size_t
+/* Get value (in KiB) of memory/target in XenStore for current domain */
+static uint64_t
xenbus_balloon_read_target(void)
{
unsigned long long new_target;
@@ -198,16 +189,13 @@ xenbus_balloon_read_target(void)
return 0;
}
- /* Returned in KB */
-
return new_target;
}
+/* Set memory/target value (in KiB) in XenStore for current domain */
static void
xenbus_balloon_write_target(unsigned long long new_target)
{
-
- /* new_target is in KB */
if (0 != xenbus_printf(NULL, "memory", "target", "%llu", new_target)) {
printf("error, couldn't write xenbus target node\n");
}
@@ -215,57 +203,31 @@ xenbus_balloon_write_target(unsigned lon
return;
}
-static size_t
-balloon_get_target(void)
-{
- size_t target;
-
- mutex_spin_enter(&balloon_conf.tgtlock);
- target = balloon_conf.target;
- mutex_spin_exit(&balloon_conf.tgtlock);
-
- return target;
-
-}
-
-static void
-balloon_set_target(size_t target)
-{
-
- mutex_spin_enter(&balloon_conf.tgtlock);
- balloon_conf.target = target;
- mutex_spin_exit(&balloon_conf.tgtlock);
-
- return;
-
-}
-
/*
* This is the special case where, due to the driver not reaching
* current balloon_conf.target, a new value is internally calculated
* and fed back to both the sysctl and the xenbus interfaces,
* described above.
*/
+#if 0
static void
balloon_feedback_target(size_t newtarget)
{
/* Notify XenStore. */
xenbus_balloon_write_target(BALLOON_PAGES_TO_KB(newtarget));
/* Update sysctl value XXX: Locking ? */
- sysctl_target = BALLOON_PAGES_TO_KB(newtarget);
+ sysctl_target = newtarget;
/* Finally update our private copy */
- balloon_set_target(newtarget);
-}
-
-
-/* Number of pages currently used up by balloon */
-static size_t
-balloon_reserve(void)
-{
- return balloon_conf.balloon_num_page_entries;
+ //balloon_set_target(newtarget);
}
+#endif
+/*
+ * Reserve @npages pages of domain's memory. For each reserved page, add
+ * it to the list of MFNs that will be passed as argument to hypervisor
+ * memory operation
+ */
static size_t
reserve_pages(size_t npages, xen_pfn_t *mfn_list)
{
@@ -279,10 +241,13 @@ reserve_pages(size_t npages, xen_pfn_t *
for (rpages = 0; rpages < npages; rpages++) {
- pg = uvm_pagealloc(NULL, 0, NULL,
- UVM_PGA_ZERO);
+ pg = uvm_pagealloc(NULL, 0, NULL, UVM_PGA_ZERO);
+ if (pg == NULL)
+ break;
- if (pg == NULL) {
+ bpg_entry = kmem_alloc(sizeof *bpg_entry, KM_SLEEP);
+ if (bpg_entry == NULL) {
+ uvm_pagefree(pg);
break;
}
@@ -294,24 +259,12 @@ reserve_pages(size_t npages, xen_pfn_t *
/* Invalidate pg */
xpmap_phys_to_machine_mapping[
- (pa - XPMAP_OFFSET) >> PAGE_SHIFT
+ (pa - XPMAP_OFFSET) >> PAGE_SHIFT
] = INVALID_P2M_ENTRY;
splx(s);
- /* Save mfn */
- /*
- * XXX: We don't keep a copy, but just save a pointer
- * to the uvm pg handle. Is this ok ?
- */
-
- bpg_entry = kmem_alloc(sizeof *bpg_entry, KM_SLEEP);
-
- if (bpg_entry == NULL) {
- uvm_pagefree(pg);
- break;
- }
-
+ /* Save MFN */
bpg_entry->pg = pg;
SLIST_INSERT_HEAD(&balloon_conf.balloon_page_entries,
@@ -322,24 +275,28 @@ reserve_pages(size_t npages, xen_pfn_t *
return rpages;
}
+/*
+ * Reclaim @npages pages from domain's balloon. For each reclaimed page,
+ * remove it from the list of reserved pages, and give them back to
+ * uvm(9).
+ */
static size_t
-unreserve_pages(size_t ret, xen_pfn_t *mfn_list)
+unreserve_pages(size_t npages, xen_pfn_t *mfn_list)
{
-
int s;
- size_t npages;
+ size_t rpages;
paddr_t pa;
struct vm_page *pg;
struct balloon_page_entry *bpg_entry;
- for (npages = 0; npages < ret; npages++) {
+ for (rpages = 0; rpages < npages; rpages++) {
if (SLIST_EMPTY(&balloon_conf.balloon_page_entries)) {
/*
* XXX: This is the case where extra "hot-plug"
* mem w.r.t boot comes in
*/
- printf("Balloon is empty. can't be collapsed further!");
+ printf("Balloon empty. Cannot be collapsed further!\n");
break;
}
@@ -351,33 +308,38 @@ unreserve_pages(size_t ret, xen_pfn_t *m
kmem_free(bpg_entry, sizeof *bpg_entry);
- s = splvm();
-
/* Update P->M */
pa = VM_PAGE_TO_PHYS(pg);
+
+ s = splvm();
+
xpmap_phys_to_machine_mapping[
- (pa - XPMAP_OFFSET) >> PAGE_SHIFT] = mfn_list[npages];
+ (pa - XPMAP_OFFSET) >> PAGE_SHIFT] = mfn_list[rpages];
xpq_queue_machphys_update(
- ((paddr_t) (mfn_list[npages])) << PAGE_SHIFT, pa);
+ ((paddr_t) (mfn_list[rpages])) << PAGE_SHIFT, pa);
- xpq_flush_queue();
+ splx(s);
/* Free it to UVM */
uvm_pagefree(pg);
-
- splx(s);
}
- return npages;
+ xpq_flush_queue();
+
+ return rpages;
}
+/*
+ * Inflate balloon of @tpages pages. Pages are moved out of domain's memory
+ * to domain's balloon.
+ */
static void
balloon_inflate(size_t tpages)
{
- int s, ret;
- size_t npages, respgcnt;
+ int i, s, ret;
+ size_t respgcnt;
struct xen_memory_reservation reservation = {
.address_bits = 0,
@@ -385,39 +347,24 @@ balloon_inflate(size_t tpages)
.domid = DOMID_SELF
};
-
- npages = xenmem_get_currentreservation();
- KASSERT (npages > tpages);
- npages -= tpages;
-
-
- KASSERT(npages > 0);
- KASSERT(npages <= BALLOON_DELTA);
-
- memset(mfn_lista, 0, BALLOON_DELTA * sizeof *mfn_lista);
-
- /*
- * There's a risk that npages might overflow ret.
- * Do this is smaller steps then.
- * See: HYPERVISOR_memory_op(...) below....
+ /*
+ * Perform ballooning by increments of BALLOON_DELTA pages.
+ * This will put less pressure on the memory subsystem.
*/
+ for (i = 0; i < tpages / BALLOON_DELTA; i++) {
+ memset(mfn_list, 0, BALLOON_DELTA * sizeof(*mfn_list));
+ respgcnt = reserve_pages(tpages, mfn_list);
+
+ /* Hand over pages to Hypervisor */
+ xenguest_handle(reservation.extent_start) = mfn_list;
+ reservation.nr_extents = respgcnt;
- if (npages > XEN_RESERVATION_MAX) {
- return;
- }
-
- respgcnt = reserve_pages(npages, mfn_lista);
+ s = splvm();
+ ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
+ &reservation);
+ splx(s);
- if (respgcnt == 0) {
- return;
}
- /* Hand over pages to Hypervisor */
- xenguest_handle(reservation.extent_start) = mfn_lista;
- reservation.nr_extents = respgcnt;
-
- s = splvm();
- ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
- splx(s);
if (ret > 0 && ret != respgcnt) {
#if BALLOONDEBUG
@@ -426,7 +373,7 @@ balloon_inflate(size_t tpages)
/* Unroll loop and release page frames back to the OS. */
KASSERT(respgcnt > ret);
if ((respgcnt - ret) !=
- unreserve_pages(respgcnt - ret, mfn_lista + ret)) {
+ unreserve_pages(respgcnt - ret, mfn_list + ret)) {
panic("Could not unreserve balloon pages in "
"inflate incomplete path!");
}
@@ -440,6 +387,9 @@ balloon_inflate(size_t tpages)
return;
}
+/*
+ * Deflate balloon of @tpages pages. Pages are given back to domain's memory.
+ */
static void
balloon_deflate(size_t tpages)
{
@@ -453,7 +403,6 @@ balloon_deflate(size_t tpages)
.domid = DOMID_SELF
};
-
/*
* Trim npages, if it has exceeded the hard limit
*/
@@ -479,9 +428,9 @@ balloon_deflate(size_t tpages)
KASSERT(npages > 0);
KASSERT(npages <= BALLOON_DELTA);
- memset(mfn_lista, 0, BALLOON_DELTA * sizeof *mfn_lista);
+ memset(mfn_list, 0, BALLOON_DELTA * sizeof *mfn_list);
- if (npages > XEN_RESERVATION_MAX) {
+ if (npages > balloon_conf.balloon_res_max) {
return;
}
@@ -497,17 +446,13 @@ balloon_deflate(size_t tpages)
if (npages > balloon_reserve()) {
npages = balloon_reserve();
-
#if BALLOONDEBUG
printf("\"hot-plug\" memory unsupported - clipping "
"reservation to %zd pages.\n", pgcur + npages);
#endif
- if (!npages) { /* Nothing to do */
- return;
- }
}
- xenguest_handle(reservation.extent_start) = mfn_lista;
+ xenguest_handle(reservation.extent_start) = mfn_list;
reservation.nr_extents = npages;
s = splvm();
@@ -521,7 +466,7 @@ balloon_deflate(size_t tpages)
return;
}
- npages = unreserve_pages(ret, mfn_lista);
+ npages = unreserve_pages(ret, mfn_list);
#if BALLOONDEBUG
printf("deflated by %zu\n", npages);
@@ -532,161 +477,129 @@ balloon_deflate(size_t tpages)
}
/*
- * Synchronous call that resizes reservation
+ * The balloon thread is responsible for managing balloon of the current
+ * domain. It can inflate/deflate it according to value of @target
+ * found in the balloon_conf structure.
*/
static void
-balloon_resize(size_t targetpages)
+balloon_thread(void *ignore)
{
- size_t currentpages;
-
- /* Get current number of pages */
- currentpages = xenmem_get_currentreservation();
-
- KASSERT(currentpages > 0);
+ int pollticks;
+ size_t currentpages, target;
+ xen_pfn_t *mfn_list;
- if (targetpages == currentpages) {
+ /* Allocate list of MFNs for inflating/deflating balloon */
+ mfn_list = kmem_alloc(BALLOON_DELTA * sizeof(*mfn_list), KM_NOSLEEP);
+ if (mfn_list == NULL) {
+ aprint_error("%s: could not allocate mfn_list\n", __func__);
return;
}
-#if BALLOONDEBUG
- printf("Current pages == %zu\n", currentpages);
-#endif
-
- /* Increase or decrease, accordingly */
- if (targetpages > currentpages) {
- balloon_deflate(targetpages);
- } else {
- balloon_inflate(targetpages);
- }
-
- return;
-}
-
-static void
-balloon_thread(void *ignore)
-{
-
- int i = 0, deltachunk = 0, pollticks;
- size_t current, tgtcache;
- ssize_t delta = 0; /* The balloon increment size */
-
- pollticks = mstohz(BALLOONINTERVALMS);
-
- /*
- * Get target. This will ensure that the wait loop (below)
- * won't break out until the target is set properly for the
- * first time. The value of targetinprogress is probably
- * rubbish.
- */
-
for/*ever*/ (;;) {
- mutex_enter(&balloon_conf.flaglock);
+ /* Set (or reset) timer */
+ pollticks = mstohz(BALLOONINTERVALMS);
- while (!(delta = balloon_get_target() -
- (current = xenmem_get_currentreservation()))) {
+ /* Monitor change of the target number of balloon pages */
+ for (;;) {
+ currentpages = xenmem_get_currentreservation();
+
+ mutex_tryenter(&balloon_conf.balloon_mtx);
+ target = balloon_conf.balloon_target;
+ if (currentpages != target) {
+ /* there's some work to do */
+ mutex_exit(&balloon_conf.balloon_mtx);
+ break;
+ }
- if (EWOULDBLOCK ==
- cv_timedwait(&balloon_conf.cv_memchanged,
- &balloon_conf.flaglock,
- pollticks)) {
+ /* no need for change -- wait for a signal */
+ if (cv_timedwait(&balloon_conf.balloon_cv,
+ &balloon_conf.balloon_mtx,
+ pollticks) == EWOULDBLOCK) {
/*
* Get a bit more lethargic. Rollover
* is ok.
*/
pollticks += mstohz(BALLOONINTERVALMS);
-
- } else { /* activity! Poll fast! */
- pollticks = mstohz(BALLOONINTERVALMS);
}
}
- KASSERT(delta <= INT_MAX && delta >= INT_MIN); /* int abs(int);
*/
- KASSERT(abs(delta) < XEN_RESERVATION_MAX);
-
- if (delta >= 0) {
- deltachunk = MIN(BALLOON_DELTA, delta);
- } else {
- deltachunk = MAX(-BALLOON_DELTA, delta);
- }
-
- tgtcache = current + deltachunk;
-
- if (deltachunk && i >= BALLOON_RETRIES) {
- tgtcache = xenmem_get_currentreservation();
- balloon_feedback_target(tgtcache);
- if (i > BALLOON_RETRIES) {
- /* Perhaps the "feedback" failed ? */
- panic("Multiple Balloon retry resets.\n");
- }
-
-#if BALLOONDEBUG
- printf("Aborted new target at %d tries\n", i);
- printf("Fed back new target value %zu\n", tgtcache);
- printf("delta == %zd\n", delta);
- printf("deltachunk == %d\n", deltachunk);
-#endif
-
- } else {
-
+ /* Alright, now there is a new target to set */
#if BALLOONDEBUG
- printf("new target ==> %zu\n", tgtcache);
+ printf("%s: new target: %zu\n", __func__, target);
#endif
- balloon_resize(tgtcache);
- }
-
- current = xenmem_get_currentreservation();
- /*
- * Every deltachunk gets a fresh set of
- * BALLOON_RETRIES
+ /*
+ * We assume that xenbus_balloon_watcher() and
+ * sysctl(9) handlers checked the sanity of the
+ * new target value, so now we inflate/deflate balloon
+ * accordingly.
*/
- i = (current != tgtcache) ? i + 1 : 0;
+ /* XXX: need error handling */
+ /* Increase or decrease, accordingly */
+ printf("%s: ", __func__);
+ if (target > currentpages) {
+ printf("deflate: %zd\n", target - currentpages);
+ //balloon_deflate(target - currentpages);
+ } else {
+ printf("inflate: %zd\n", currentpages - target);
+ //balloon_inflate(currentpages - target);
+ }
- mutex_exit(&balloon_conf.flaglock);
+ /* XXX JYM remove, because it should be handled as error case */
+ mutex_enter(&balloon_conf.balloon_mtx);
+ balloon_conf.balloon_target = currentpages;
+ mutex_exit(&balloon_conf.balloon_mtx);
}
-
}
+/*
+ * Handler called when memory/target value changes inside Xenstore.
+ * All sanity checks must happen in this handler, as it is the common
+ * entry point to notify balloon thread.
+ */
static void
xenbus_balloon_watcher(struct xenbus_watch *watch, const char **vec,
unsigned int len)
{
- size_t new_target; /* In KB */
-
- if (0 == (new_target = (size_t) xenbus_balloon_read_target())) {
- /* Don't update target value */
+ size_t new_target;
+ uint64_t target_kb = xenbus_balloon_read_target();
+ uint64_t target_max = BALLOON_PAGES_TO_KB(xenmem_get_maxreservation());
+
+ if (target_kb < balloon_conf.balloon_res_min) {
+ printf("Xen balloon: new_target %"PRIu64" unacceptable "
+ "(below min: %"PRIu64")\n",
+ target_kb, balloon_conf.balloon_res_min);
return;
}
-
- new_target = BALLOON_PAGE_FLOOR(new_target);
-
-#if BALLOONDEBUG
- if (new_target < BALLOON_KB_TO_PAGES(balloon_conf.xen_res_min) ||
- new_target > BALLOON_KB_TO_PAGES(balloon_conf.xen_res_max)) {
- printf("Requested target is unacceptable.\n");
+ if (target_kb > target_max) {
+ /*
+ * Should not happen. Hypervisor should block balloon
+ * requests above mem-max.
+ */
+ printf("Xen balloon: new_target %"PRIu64" unacceptable "
+ "(above max: %"PRIu64")\n",
+ target_kb, target_max);
return;
}
-#endif
- /*
- * balloon_set_target() calls
- * xenbus_balloon_write_target(). Not sure if this is racy
- */
- balloon_set_target(BALLOON_KB_TO_PAGES(new_target));
+ new_target = BALLOON_KB_TO_PAGES(target_kb);
#if BALLOONDEBUG
- printf("Setting target to %zu\n", new_target);
- printf("Current reservation is %zu\n", xenmem_get_currentreservation());
+ printf("%s: current reservation: %zu pages\n",
+ __func__, xenmem_get_currentreservation());
+ printf("%s: new target: %zu pages\n", __func__, new_target);
#endif
- /* Notify balloon thread, if we can. */
- if (mutex_tryenter(&balloon_conf.flaglock)) {
- cv_signal(&balloon_conf.cv_memchanged);
- mutex_exit(&balloon_conf.flaglock);
+ /* Only wake-up balloon thread if target changes. */
+ mutex_enter(&balloon_conf.balloon_mtx);
+ if (balloon_conf.balloon_target != new_target) {
+ balloon_conf.balloon_target = new_target;
+ cv_signal(&balloon_conf.balloon_cv);
}
+ mutex_exit(&balloon_conf.balloon_mtx);
return;
}
@@ -697,55 +610,38 @@ balloon_xenbus_setup(void)
size_t currentpages;
- /* Allocate list of MFNs for inflating/deflating balloon */
- mfn_lista = kmem_alloc(BALLOON_DELTA * sizeof *mfn_lista, KM_NOSLEEP);
- if (mfn_lista == NULL) {
- aprint_error("%s: could not allocate mfn_lista\n", __func__);
- return;
- }
-
- /* Setup flaglocks, condvars et. al */
- mutex_init(&balloon_conf.flaglock, MUTEX_DEFAULT, IPL_NONE);
- mutex_init(&balloon_conf.tgtlock, MUTEX_DEFAULT, IPL_HIGH);
- cv_init(&balloon_conf.cv_memchanged, "balloon");
+ /* Initialize target mutex and condvar */
+ mutex_init(&balloon_conf.balloon_mtx, MUTEX_DEFAULT, IPL_NONE);
+ cv_init(&balloon_conf.balloon_cv, "balloon");
SLIST_INIT(&balloon_conf.balloon_page_entries);
balloon_conf.balloon_num_page_entries = 0;
- /* Deliberately not-constified for future extensibility */
- balloon_conf.xen_res_min = XEN_RESERVATION_MIN;
- balloon_conf.xen_res_max = XEN_RESERVATION_MAX;
-
-#if BALLOONDEBUG
- printf("uvmexp.freemin == %d\n", uvmexp.freemin);
- printf("xen_res_min == %zu\n", balloon_conf.xen_res_min);
- printf("xen_res_max == %zu\n", balloon_conf.xen_res_max);
-#endif
/* Get current number of pages */
currentpages = xenmem_get_currentreservation();
KASSERT(currentpages > 0);
- /* Update initial target value */
- balloon_set_target(currentpages);
+ /* Update initial target value - no need to lock for initialization */
+ balloon_conf.balloon_target = currentpages;
- /*
- * Initialise the sysctl_xxx copies of target and current
- * as above, because sysctl inits before balloon_xenbus_setup()
- */
- sysctl_current = currentpages;
- sysctl_target = BALLOON_PAGES_TO_KB(currentpages);
+ /* Set the values used by sysctl */
+ balloon_conf.balloon_res_min =
+ BALLOON_PAGES_TO_KB(XEN_RESERVATION_MIN);
+
+#if BALLOONDEBUG
+ printf("balloon current reservation: %"PRIu64"\n",
+ BALLOON_PAGES_TO_KB(currentpages));
+ printf("balloon min reservation: %"PRIu64"\n",
+ balloon_conf.balloon_res_min);
+ printf("balloon max reservation:: %"PRIu64"\n",
+ BALLOON_PAGES_TO_KB(xenmem_get_maxreservation()));
+#endif
/* Setup xenbus node watch callback */
if (register_xenbus_watch(&xenbus_balloon_watch)) {
aprint_error("%s: unable to watch memory/target\n", __func__);
- cv_destroy(&balloon_conf.cv_memchanged);
- mutex_destroy(&balloon_conf.tgtlock);
- mutex_destroy(&balloon_conf.flaglock);
- kmem_free(mfn_lista, BALLOON_DELTA * sizeof *mfn_lista);
- mfn_lista = NULL;
- return;
-
+ goto error;
}
/* Setup kernel thread to asynchronously (in/de)-flate the balloon */
@@ -753,102 +649,128 @@ balloon_xenbus_setup(void)
NULL /* arg */, NULL, "balloon")) {
aprint_error("%s: unable to create balloon thread\n", __func__);
unregister_xenbus_watch(&xenbus_balloon_watch);
- cv_destroy(&balloon_conf.cv_memchanged);
- mutex_destroy(&balloon_conf.tgtlock);
- mutex_destroy(&balloon_conf.flaglock);
+ goto error;
}
return;
-}
+error:
-#if DOM0OPS
+ cv_destroy(&balloon_conf.balloon_cv);
+ mutex_destroy(&balloon_conf.balloon_mtx);
+ return;
+
+}
/*
* sysctl(9) stuff
*/
-/* sysctl helper routine */
+/* routine to control the minimum memory reserved for the domain */
static int
-sysctl_kern_xen_balloon(SYSCTLFN_ARGS)
+sysctl_kern_xen_balloon_min(SYSCTLFN_ARGS)
{
-
struct sysctlnode node;
-
- /*
- * Assumes SIZE_T_MAX <= ((uint64_t) -1) see createv() in
- * SYSCTL_SETUP(...) below
- */
-
+ u_quad_t newval;
int error;
- int64_t node_val;
- KASSERT(rnode != NULL);
node = *rnode;
+ node.sysctl_data = &newval;
+ newval = *(u_quad_t *)rnode->sysctl_data;
- if (strcmp(node.sysctl_name, "current") == 0) {
- node_val = BALLOON_PAGES_TO_KB(xenmem_get_currentreservation());
- node.sysctl_data = &node_val;
- return sysctl_lookup(SYSCTLFN_CALL(&node));
-#ifndef XEN_BALLOON /* Read only, if balloon is disabled */
- } else if (strcmp(node.sysctl_name, "target") == 0) {
- if (newp != NULL || newlen != 0) {
- return (EPERM);
- }
- node_val = BALLOON_PAGES_TO_KB(xenmem_get_currentreservation());
- node.sysctl_data = &node_val;
- error = sysctl_lookup(SYSCTLFN_CALL(&node));
+ error = sysctl_lookup(SYSCTLFN_CALL(&node));
+ if (error || newp == NULL)
return error;
+
+ /* Safeguard value: refuse to go below. */
+ if (newval < XEN_RESERVATION_MIN) {
+ printf("WARNING: trying to balloon below minimum safe "
+ "value: %"PRIu64"\n", XEN_RESERVATION_MIN);
+ return EINVAL;
}
-#else
- } else if (strcmp(node.sysctl_name, "target") == 0) {
- node_val = * (int64_t *) rnode->sysctl_data;
- node_val = BALLOON_PAGE_FLOOR(node_val);
- node.sysctl_data = &node_val;
- error = sysctl_lookup(SYSCTLFN_CALL(&node));
- if (error != 0) {
- return error;
- }
- /* Sanity check new size */
- if (node_val < BALLOON_PAGES_TO_KB(XEN_RESERVATION_MIN) ||
- node_val > BALLOON_PAGES_TO_KB(XEN_RESERVATION_MAX) ) {
-#if BALLOONDEBUG
- printf("node_val out of range.\n");
- printf("node_val = %"PRIu64"\n", node_val);
-#endif
- return EINVAL;
- }
+ balloon_conf.balloon_res_min = newval;
+ return 0;
+}
-#if BALLOONDEBUG
- printf("node_val = %"PRIu64"\n", node_val);
-#endif
+/* returns the current memory reservation of the domain */
+static int
+sysctl_kern_xen_balloon_max(SYSCTLFN_ARGS)
+{
+ struct sysctlnode node;
+ u_quad_t node_val;
- if (node_val != BALLOON_PAGES_TO_KB(balloon_get_target())) {
- * (int64_t *) rnode->sysctl_data = node_val;
+ node = *rnode;
-#if BALLOONDEBUG
- printf("setting to %" PRIu64"\n", node_val);
-#endif
+ node_val = BALLOON_PAGES_TO_KB(xenmem_get_maxreservation());
+ node.sysctl_data = &node_val;
+ return sysctl_lookup(SYSCTLFN_CALL(&node));
+}
- balloon_set_target(BALLOON_KB_TO_PAGES(node_val));
+/* returns the current memory reservation of the domain */
+static int
+sysctl_kern_xen_balloon_current(SYSCTLFN_ARGS)
+{
+ struct sysctlnode node;
+ u_quad_t node_val;
- /* Notify balloon thread, if we can. */
- if (mutex_tryenter(&balloon_conf.flaglock)) {
- cv_signal(&balloon_conf.cv_memchanged);
- mutex_exit(&balloon_conf.flaglock);
- }
+ node = *rnode;
- /* Notify XenStore. */
- xenbus_balloon_write_target(node_val);
- }
+ node_val = BALLOON_PAGES_TO_KB(xenmem_get_currentreservation());
+ node.sysctl_data = &node_val;
+ return sysctl_lookup(SYSCTLFN_CALL(&node));
+}
- return 0;
+#ifdef XEN_BALLOON
+/* returns the target memory reservation of the domain */
+/* XXX query memory/target in Xenstore? */
+static int
+sysctl_kern_xen_balloon_target(SYSCTLFN_ARGS)
+{
+ struct sysctlnode node;
+ u_quad_t newval, maxres;
+ int error;
+
+ node = *rnode;
+ node.sysctl_data = &newval;
+ /* we are just reading the value of balloon_target, no lock needed */
+ newval = BALLOON_PAGES_TO_KB(*(u_quad_t*)rnode->sysctl_data);
+
+ error = sysctl_lookup(SYSCTLFN_CALL(&node));
+ if (newp == NULL || error != 0) {
+ return error;
+ }
+
+ /*
+ * Sanity check new size
+ * We should not balloon below the minimum reservation
+ * set by the domain, nor above the maximum reservation set
+ * by domain controller.
+ * Note: domain is not supposed to receive balloon requests when
+ * they are above maximum reservation, but better be safe than
+ * sorry.
+ */
+ maxres = BALLOON_PAGES_TO_KB(xenmem_get_maxreservation());
+ if (newval < balloon_conf.balloon_res_min ||
+ newval > maxres) {
+#if BALLOONDEBUG
+ printf("Trying to balloon out of bounds: %"PRIu64"\n",
+ newval);
+ printf("min %"PRIu64", max %"PRIu64"\n",
+ balloon_conf.balloon_res_min, maxres);
+#endif
+ return EINVAL;
}
-#endif /* XEN_BALLOON */
- return EINVAL;
+ /*
+ * Write new value inside Xenstore. This will fire the memory/target
+ * watch handler, xenbus_balloon_watcher().
+ */
+ xenbus_balloon_write_target(newval);
+
+ return 0;
}
+#endif /* XEN_BALLOON */
/* Setup nodes. */
SYSCTL_SETUP(sysctl_kern_xen_balloon_setup, "sysctl kern.xen.balloon setup")
@@ -880,20 +802,40 @@ SYSCTL_SETUP(sysctl_kern_xen_balloon_set
CTL_CREATE, CTL_EOL);
sysctl_createv(clog, 0, &node, NULL,
- CTLFLAG_PERMANENT,
- CTLTYPE_QUAD, "current",
- SYSCTL_DESCR("current memory reservation from "
- "hypervisor, in pages."),
- sysctl_kern_xen_balloon, 0, &sysctl_current, 0,
+ CTLFLAG_PERMANENT | CTLFLAG_READONLY,
+ CTLTYPE_QUAD, "mem-current",
+ SYSCTL_DESCR("Domain's current memory reservation from "
+ "hypervisor, in KiB."),
+ sysctl_kern_xen_balloon_current, 0,
+ NULL, 0,
CTL_CREATE, CTL_EOL);
+#ifdef XEN_BALLOON
sysctl_createv(clog, 0, &node, NULL,
CTLFLAG_PERMANENT | CTLFLAG_READWRITE,
- CTLTYPE_QUAD, "target",
- SYSCTL_DESCR("Target memory reservation to adjust "
- "balloon size to, in pages"),
- sysctl_kern_xen_balloon, 0, &sysctl_target, 0,
+ CTLTYPE_QUAD, "mem-target",
+ SYSCTL_DESCR("Target memory reservation for domain, in KiB."),
+ sysctl_kern_xen_balloon_target, 0,
+ &balloon_conf.balloon_target, 0,
+ CTL_CREATE, CTL_EOL);
+#endif /* XEN_BALLOON */
+
+ sysctl_createv(clog, 0, &node, NULL,
+ CTLFLAG_PERMANENT | CTLFLAG_READWRITE,
+ CTLTYPE_QUAD, "mem-min",
+ SYSCTL_DESCR("Minimum amount of memory the domain "
+ "reserves, in KiB."),
+ sysctl_kern_xen_balloon_min, 0,
+ &balloon_conf.balloon_res_min, 0,
CTL_CREATE, CTL_EOL);
-}
-#endif /* DOM0OPS */
+ sysctl_createv(clog, 0, &node, NULL,
+ CTLFLAG_PERMANENT | CTLFLAG_READONLY,
+ CTLTYPE_QUAD, "mem-max",
+ SYSCTL_DESCR("Maximum amount of memory the domain "
+ "can use, in KiB."),
+ sysctl_kern_xen_balloon_max, 0,
+ NULL, 0,
+ CTL_CREATE, CTL_EOL);
+
+}
Home |
Main Index |
Thread Index |
Old Index