Re: Parallelize page queues

To: Abhirup Das <abhirup.das%hotmail.com@localhost>
Subject: Re: Parallelize page queues
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Tue, 5 Dec 2017 18:28:56 +0000

> Date: Tue, 5 Dec 2017 06:00:38 +0000
> From: Abhirup Das <abhirup.das%hotmail.com@localhost>
> 
> Hi i just found this project idea under gsoc ideas, 
> 
> https://wiki.netbsd.org/projects/project/page_queues/
> 
> has this project already been completed? If not what documentation and resources do I need to build this, and where can I find them? 
> 
> I would like to take this project on from January. 

Hi, Abhirup!

The first thing to do for this is to make sure you're comfortable
building and testing the NetBSD kernel.  You can do this in a virtual
machine, e.g. qemu, though eventually it will be necessary to test
things on a physical machine with many cores.

Next, the subsystem that this project mainly requires changing is uvm,
which lives in src/sys/uvm.  See the uvm(9) man page for some general
information about it and references to papers on the subject.  Most of
it is not important for this project, e.g. uvm_map.  You should get
some familiarity with how uvm objects are used, and with how uvm page
allocation and lookup, and the uvm page daemon, work.

I've attached a draft (not even compile-tested, just sketched) of the
plan I had in mind for parallelizing the _free_ page queues, which are
the queues of pages of RAM that are not in use at all.  This replaces
some of the functions in uvm_page.c, and may require some changes to,
e.g., struct uvm_cpu -- not sure offhand.

I don't have a draft for the other two parts of the project:
parallelizing access to the active/inactive page queues, and
parallelizing page lookups.  Those require a little more research to
study our access patterns and decide on approaches to experiment with.

Feel free to post questions here or to me personally about this
project.

/*	$NetBSD$	*/

/*-
 * Copyright (c) 2015 The NetBSD Foundation, Inc.
 * All rights reserved.
 *
 * This code is derived from software contributed to The NetBSD Foundation
 * by Taylor R. Campbell.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
 * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 */

#include <sys/cdefs.h>
__KERNEL_RCSID(0, "$NetBSD$");

static struct uvm_cpu *
uvm_cpu_get(void)
{

	kpreempt_disable();
	return curcpu()->ci_data.cpu_uvm;
}

static void
uvm_cpu_put(struct uvm_cpu *ucpu __diagused)
{

	KASSERT(kpreempt_disabled());
	KASSERT(ucpu == curcpu()->ci_data.cpu_uvm);
	kpreempt_enable();
}

/* Sysctl knobs.  */
unsigned	uvm_pagealloc_refill_batch = 256;	/* XXX */
unsigned	uvm_pagealloc_refill_lowat = 128;	/* XXX */
unsigned	uvm_pagealloc_refill_hiwat = 4096;	/* XXX */

/* Global free page queue lock.  */
kmutex_t	uvm_fpageqlock;

/*
 * uvm_pagealloc_strat: Allocate a page to be owned by uobj or uanon at
 * offset off, using the specified strategy.
 */
struct vm_page *
uvm_pagealloc_strat(struct uvm_object *uobj, voff_t off, struct vm_anon *uanon,
    int flags, int strategy, int stratarg)
{
	struct uvm_cpu *ucpu;
	unsigned color;
	struct lwp *l;
	unsigned q0, q1;
	unsigned pgflno;
	struct vm_page *pg;
	bool need_zero;

	KASSERT((uobj == NULL) ^ (uanon == NULL));
	KASSERT(uanon == NULL || (flags & UVM_FLAG_COLORMATCH) || off == 0);
	KASSERT(off == trunc_page(off));
	KASSERT(uobj == NULL || mutex_owned(uobj->vmobjlock));

	/* Lock the CPU.  */
	ucpu = uvm_cpu_get();

	/* Choose a color.  */
	if (flags & UVM_FLAG_COLORMATCH)
		color = atop(off) & uvmexp.colormask;
	else
		color = ucpu->page_free_nextcolor;

	/*
	 * Allow kernel and realtime threads to use the kernel reserve
	 * (but not the pagedaemon reserve).
	 *
	 * XXX OOPS!  Nothing uses this flag just yet.  Where to put
	 * it?  In uvm_pagealloc_refill_cpu, probably.
	 */
	l = curlwp;
	if (__predict_true(l != NULL) && lwp_eprio(l) >= PRI_KTHREAD)
		flags |= UVM_PGA_USERESERVE;

	/* Decide whether to prefer zero'd pages or unknown pages.  */
	CTASSERT(PGFL_NQUEUES == 2);
	if (flags & UVM_PGA_ZERO) {
		q0 = PGFL_ZEROS;
		q1 = PGFL_UNKNOWN;
	} else {
		q0 = PGFL_UNKNOWN;
		q1 = PGFL_ZEROS;
	}

	/* Apply the page allocation strategy.  */
	switch (strategy) {
	case UVM_PGA_STRAT_ONLY:	/* must be on requested freelist */
	case UVM_PGA_STRAT_FALLBACK:	/* start with requested freelist */
		KASSERT(0 <= stratarg);
		KASSERT(stratarg < VM_NFREELIST);
		pgflno = stratarg;
		pg = uvm_pagealloc_pgfl(ucpu, pgflno, q0, q1, &color, flags);
		if (pg != NULL)
			break;
		if (strategy == UVM_PGA_STRAT_ONLY)
			break;
		/* FALLTHROUGH */
	case UVM_PGA_STRAT_NORMAL:	/* try all freelists */
		for (pgflno = 0; pgflno < VM_NFREELIST; pgflno++) {
			pg = uvm_pagealloc_pgfl(ucpu, pgflno, q0, q1, &color,
			    flags);
			if (pg != NULL)
				break;
		}
		break;
	default:
		panic("uvm_pagealloc_strat: invalid strategy: %d", strategy);
	}

	if (pg == NULL)
		goto out;
	KASSERT(pg->pqflags == PQ_FREE);

	/* Remember the color for the next allocation.  */
	ucpu->page_free_nextcolor = (color + 1) & uvmexp.colormask;

	/* Account zero hits/misses, and decide whether to zero.  */
	if (flags & UVM_PGA_ZERO) {
		if (pg->flags & PG_ZERO) {
			ucpu->stats.pga_zerohit++;
			need_zero = false;
		} else {
			ucpu->stats.pga_zeromiss++;
			need_zero = true;
		}
		if (ucpu->npages[PGFL_ZEROS] < ucpu->npages[PGFL_UNKNOWN])
			ucpu->page_idle_zero = vm_page_zero_enable;
	}

	/*
	 * Assign the page's identity:
	 * - owned by specified obj or anon at specified offset
	 * - busy: busied for caller
	 * - fake: not yet initialized
	 * - clean: caller must initialize before unbusying
	 */
	pg->offset = off;
	pg->uobject = obj;
	pg->uanon = anon;
	pg->flags = PG_BUSY|PG_CLEAN|PG_FAKE;
	pg->pqflags = 0;
	if (anon) {
		anon->an_page = pg;
		pg->pqflags |= PQ_ANON;
		ucpu->nanonpages++;
	} else if (obj) {
		uvm_pageinsert(obj, pg);
	}

	/* Set up ownership tracking.  */
#ifdef UVM_PAGE_TRKOWN
	pg->owner_tag = NULL;
#endif
	UVM_PAGE_OWN(pg, "new alloc");

	/* Zero the page if caller asked and we're not sure.  */
	if (flags & UVM_PGA_ZERO) {
		/* Zero'd pages are not clean.  (XXX Why not?)  */
		pg->flags &= ~PG_CLEAN;
		if (need_zero)
			pmap_zero_page(VM_PAGE_TO_PHYS(pg));
	}

out:	/* Unlock the CPU and return the page, if any.  */
	uvm_cpu_put(ucpu);
	return pg;
}

/*
 * uvm_pagealloc_pgfl: Allocate a page from the specified page
 * freelist, preferring queue q0 over queue q1, using *colorp exactly
 * if flags & UVM_FLAG_COLORMATCH or else trying *colorp first and
 * returning the actual color in *colorp.
 *
 * Try first from the local CPU's queues.  If no match is found,
 * refill it from the global queues
 */
static struct vm_page *
uvm_pagealloc_pgfl(struct uvm_cpu *ucpu, unsigned pgflno, unsigned q0,
    unsigned q1, unsigned *colorp, int flags)
{
	unsigned wantcolor = *colorp;
	struct vm_page *pg;

	KASSERT(kpreempt_disabled());

	/*
	 * Iterate over the colors outside the CPU/global alternation,
	 * if user wants colormatch.  Otherwise, we will try other
	 * colors in the local CPU queue first and in the global queues
	 * next.
	 */
	color = wantcolor;
	do {
		pg = uvm_pagealloc_pgfl_cpu(ucpu, pgflno, q0, q1, &color,
		    flags);
		if (__predict_true(pg != NULL))
			goto out;
		/*
		 * Couldn't find a matching page on the local CPU's
		 * queues.  Try to refill the local CPU's queues from
		 * the global queues.
		 */
		KASSERT(color == *colorp);
		uvm_pagealloc_refill_cpu(ucpu, pgflno, q0, q1, color, flags);
		pg = uvm_pagealloc_pgfl_cpu(ucpu, pgflno, q0, q1, &color,
		    flags);
		if (__predict_true(pg != NULL))
			goto out;
		if (!(flags & UVM_FLAG_COLORMATCH))
			break;
		color = (color + 1) & uvmexp.colormask;
	} while (color != wantcolor);

	/* Nada.  Give up.  */
	return NULL;

out:	/* Got a page.  */
	if (color == *colorp) {
		ucpu->stats.colorhit++;
	} else {
		ucpu->stats.colormiss++;
		*colorp = color;
	}

	return pg;
}

/*
 * uvm_pagealloc_pgfl_cpu: Try to grab a page from the pre-CPU freelist
 * pgfl, trying first queue q0 and then queue q1.  Use *colorp exactly
 * if flags & UVM_FLAG_COLORMATCH; otherwise start with *colorp but try
 * every other color too.
 */
static inline struct vm_page *
uvm_pagealloc_pgfl_cpu(struct uvm_cpu *ucpu, unsigned pgflno, unsigned q0,
    unsigned q1, unsigned *colorp, int flags)
{
	unsigned color, startcolor = *colorp;
	struct pgfreelist *const pgfreelist = &ucpu->page_free[pgflno];
	struct vm_page *pg = NULL;

	KASSERT(kpreempt_disabled());

	color = startcolor;
	do {
		pg = uvm_pagealloc_pgfl_cpu_qc(ucpu, pgfreelist, q0, color);
		if (pg != NULL)
			goto out;
		pg = uvm_pagealloc_pgfl_cpu_qc(ucpu, pgfreelist, q1, color);
		if (pg != NULL)
			goto out;
		color = (color + 1) & uvmexp.colormask;
	} while (color != startcolor && !(flags & UVM_FLAG_COLORMATCH));

	/* Nothing found.  */
	return NULL;

out:	*colorp = color;
	return pg;
}

/*
 * uvm_pagealloc_pgfl_cpu_qc: Try to grab a page with the specified
 * color from the queue q of the page freelist pgfreelist on the local
 * CPU.
 */
static inline struct vm_page *
uvm_pagealloc_pgfl_cpu_qc(struct uvm_cpu *ucpu, unsigned pgflno,
    unsigned q, unsigned color)
{
	struct pgflist *pgflist;
	struct vm_page *pg;

	KASSERT(kpreempt_disabled());

	pgflist = &ucpu->page_free[pgflno].pgfl_buckets[color].pgfl_queues[q];
	pg = LIST_FIRST(pgflist);
	if (pg == NULL)
		return NULL;
	LIST_REMOVE(pg, listq.list);

	/* Make sure the page looks sane.  */
	KASSERT(pg->pqflags & PQ_FREE);
	KASSERT(ucpu == VM_FREE_PAGE_TO_CPU(pg));

	/* Account for it.  */
	KASSERT(0 < ucpu->npages[q]);
	ucpu->npages[q]--;

	/*
	 * If we went below the low water mark, try to refill from the
	 * global queues so we'll be ready for .
	 */
	CTASSERT(PGFL_NQUEUES == 2);
	if (ucpu->npages[q] + ucpu->npages[1 - q] < uvm_pagealloc_refill.lowat)
		uvm_pagealloc_refill_cpu(ucpu, pgflno, q, 1 - q, color, flags);

	/* Account for zero pages.  */
	if (pg->flags & PG_ZERO) {
		KASSERT(q == PGFL_ZEROS);
		ucpu->stats.nzeropages--;
	} else {
		KASSERT(q == PGFL_UNKNOWN);
	}

	return pg;
}

/*
 * uvm_pagealloc_refill_cpu: Try to refill the local CPU's queues from
 * the global queues.  Prefer queue q0, then try q1.
 *
 * XXX OOPS!  Does it make sense to grab pages for this color?  Seems
 * we ought to cycle through the colors.
 */
static void
uvm_pagealloc_refill_cpu(struct uvm_cpu *ucpu, unsigned pgflno, unsigned q0,
    unsigned q1, unsigned color, int flags)
{
	struct pgflbucket *cpu_bucket, *global_bucket;
	size_t nreq;
	struct vm_page *pg;

	KASSERT(kpreempt_disabled());

	cpu_bucket = &ucpu->page_free[pgflno].pgfl_buckets[color];
	CTASSERT(PGFL_NQUEUES == 2);
	KASSERT(LIST_EMPTY(bucket->pgfl_queues[q0]));
	KASSERT(LIST_EMPTY(bucket->pgfl_queues[q1]));

	mutex_enter(&uvm_fpageqlock);
	global_bucket = &uvm.page_free[pgflno].pgfl_buckets[color];
	nreq = uvm_pagealloc_refill.batch;
	KASSERT(0 < nreq);

	/* First try to refill from the preferred queue.  */
	do {
		if (!uvm_pagealloc_refill_1(cpu_bucket, global_bucket, q0))
			break;
	} while (nreq--);

	/* If that's not enough, try to refill from the other queue.  */
	if (nreq) {
		do {
			if (!uvm_pagealloc_refill_1(cpu_bucket, global_bucket,
				q1))
				break;
		} while (nreq--);
	}

	/* If that's still not enough, kick the pagedaemon.  */
	if (nreq)
		uvm_kick_pdaemon();
	mutex_exit(&uvm_fpageqlock);
}

static bool
uvm_pagealloc_refill_1(struct pgfl_bucket *cpu_bucket,
    struct pgfl_bucket *global_bucket, unsigned q)
{
	struct vm_page *pg;

	KASSERT(mutex_owned(&uvm_fpageqlock));

	pg = LIST_FIRST(global_bucket->pgfl_queues[q]);
	if (pg == NULL)
		return false;
	LIST_REMOVE(pg, listq.list);
	uvm.npages[q]--;
	LIST_INSERT_HEAD(cpu_bucket, cpu_bucket->pgfl_queues[q]);
	ucpu->npages[q]++;
}

void
uvm_pagefree(struct vm_page *pg)
{
	struct uvm_cpu *ucpu;
	unsigned pgflno;
	unsigned q;
	unsigned color;
	bool is_zero;

	XXX destroy loan;

	/* Lock the CPU.  */
	ucpu = uvm_cpu_get();

	/* Disassociate the page from its identity.  */
	if (pg->uobject != NULL) {
		KASSERT(pg->uanon == NULL); /* Can't be loaned.  */
		uvm_pageremove(pg->uobject, pg);
	} else if (pg->uanon != NULL) {
		pg->uanon->an_page = NULL;
		ucpu->nanonpages--;
	}

#if DEBUG
	/* Catch anyone trying to use its owner after this point.  */
	pg->uobject = (void *)0xdeadbeef;
	pg->uanon = (void *)0xdeadbeef;
#endif

	XXX pagedaemon pageq bookkeeeping; // Argh!  Global locks!
	XXX unwire;

	/* Find the free page queue to put it on. */
	is_zero = (pg->flags & PG_ZERO);
	pgflno = uvm_page_lookup_freelist(pg);
	color = VM_PGCOLOR_BUCKET(pg);
	q = is_zero? PGFL_ZEROS : PGFL_UNKNOWN;

#if DEBUG
	/* Make sure it's zero if it claims to be.  */
	if (is_zero)
		uvm_pagezerocheck(pg);
#endif

	/*
	 * Put it on the free queue: per-CPU if there aren't too many,
	 * global if we have a lot of free pages on this queue.
	 */
	pg->pqflags = PQ_FREE;
	if (ucpu->npages[q] < uvm_pagealloc_refill_hiwat) {
		pgfreelist = &ucpu->page_free[pgflno];
		pgflist = &pgfreelist.pgfl_buckets[color].pgfl_queues[q];
		LIST_INSERT_HEAD(pgflist, pg, pageq.list);
		ucpu->npages[q]++;
		if (is_zero)
			ucpu->stats.nzeropages++;
	} else {
		mutex_spin_enter(&uvm_fpageqlock);
		pgfreelist = &uvm.page_free[pgflno];
		pgflist = &pgfreelist.pgfl_buckets[color].pgfl_queues[q];
		LIST_INSERT_HEAD(pgflist, pg, pageq.list);
		uvm.npages[q]++;
		if (is_zero)
			uvm.zeropages++;
		mutex_spin_exit(&uvm_fpageqlock);
	}

	/* All done.  Unlock the CPU.  */
	uvm_cpu_put(ucpu);
}

References:
- Parallelize page queues
  - From: Abhirup Das

Prev by Date: untangling the compat mess
Next by Date: Re: struct ifnet locking [was Re: IFEF_MPSAFE]
Previous by Thread: Parallelize page queues
Next by Thread: changing _lwp_park to return unslept time.
Indexes:

Home | Main Index | Thread Index | Old Index