Re: Prospective project for Summer of Code.

To: Raunaq Kochar <raunaq.kochar%stonybrook.edu@localhost>
Subject: Re: Prospective project for Summer of Code.
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Tue, 28 Mar 2017 19:22:01 +0000

> Date: Mon, 20 Mar 2017 18:43:04 -0400
> From: Raunaq Kochar <raunaq.kochar%stonybrook.edu@localhost>
> 
> I am a Masters in Computer Science student at Stony Brook University
> looking to work as part of the Google Summer of Code on the Real
> Asynchronous I/O or Parallelize Page Queues projects.
> As I am new to the open source community, please bear with me as I try to
> answer most of the questions from the guidelines.

Hi Raunaq!  Thanks for the interest.  For a student not yet involved
in NetBSD at all, I would recommend you focus on the parallelized page
queues project, because it involves more localized changes and doesn't
cut across several different API boundaries the way aio will.

The part of NetBSD that involves parallelized page queues is uvm, the
virtual memory system, in src/sys/uvm[1].  The three parts of the
project involve different parts of uvm:

1. The *free* page queues, serialized by uvm_fpageqlock, are handled
in src/sys/uvm/uvm_page.c, in the uvm_pagealloc_* functions, to find
free pages of RAM to serve as backing pages on demand.

This part is pretty well localized -- the free page queues aren't used
much outside uvm_page.c, though it does require some coordination with
page daemon (the kernel thread that asks other subsystems to
relinquish pages currently in use in order to make them free for
others to use).

I've attached a sketch I wrote a couple years ago for a replacement
page allocation procedure with per-CPU free page queues.  It's
probably not even compile-tested -- it needs a lot of work.

2. The (in-use) page queues, serialized by uvm_pageqlock, are handled
partly in uvm_page.c, but mostly in uvm_pdpolicy_*.c.  These are
trickier: these are queues of pages that are good candidates for
freeing, when the page daemon decides it needs them.

Changing uvm_pageqlock requires a candidate plan for how to teach the
page daemon to work independently on multiple CPUs, or something like
that.  It is not necessary to articulate such a plan the project
application, and it's OK if this is more of an exploration than
implementing a definite solution, but an application should have time
for doing something about it.

3. The vmobjlock is the most difficult part requiring more measurement
first, as noted in the project description, and for someone new to
NetBSD development altogether, may be a trifle ambitious.  It is used
(a) throughout uvm to serialize access to the pages of an object such
as a file that can be mapped into any virtual address space (a `VM
object'), (b) in reading and writing pages to backing storage for
files, e.g. src/sys/miscfs/genfs/genfs_io.c, and (c) in other parts of
the file system API `vfs'.

[1] https://nxr.netbsd.org/xref/src/sys/uvm/

A couple of good tests to measure the impact of the performance are
(a) build.sh itself, and (b) pgbench.  If you use lockstat(1) during
build.sh, e.g. start build.sh in one terminal and then run `lockstat
-o lockstat.out sleep 15' in another terminal, you will almost
certainly see the uvm_fpageqlock, uvm_pageqlock, and libc vmobjlock
near the top in contention.

This is a good starting exercise to make sure you can measure the
basic problem -- though it may require a many-CPU system.  E.g., on my
24-way build machine, I see contention over these locks around
`build.sh -j16'.

> 1. I have not used CVS and hence cannot find ways to pull source code for
> NetBSD to actually have a look and play around with it? Is there a git
> repository that I could use?

Once you have installed CVS, to check out the NetBSD source, run:

% cvs -d anoncvs%anoncvs.NetBSD.org@localhost:/cvsroot co -P src

To build, run

% cd src
% ./build.sh -O ../obj.amd64 -U -u -m amd64 -j4 tools release live-image

This first builds a cross-compiler toolchain (every NetBSD build is a
cross-build), distribution sets for a full release, and then the live
USB image that you booted in qemu.

To update src to the current version, run

% cd src
% cvs update -dP

There is also a Git mirror at <https://github.com/jsonn/src.git/>, if
you are more comfortable working with Git, and as the primary mentor
for the parallelized page queues project, I would be happy to process
Git patches.

/*	$NetBSD$	*/

/*-
 * Copyright (c) 2015 The NetBSD Foundation, Inc.
 * All rights reserved.
 *
 * This code is derived from software contributed to The NetBSD Foundation
 * by Taylor R. Campbell.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
 * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 */

#include <sys/cdefs.h>
__KERNEL_RCSID(0, "$NetBSD$");

static struct uvm_cpu *
uvm_cpu_get(void)
{

	kpreempt_disable();
	return curcpu()->ci_data.cpu_uvm;
}

static void
uvm_cpu_put(struct uvm_cpu *ucpu __diagused)
{

	KASSERT(kpreempt_disabled());
	KASSERT(ucpu == curcpu()->ci_data.cpu_uvm);
	kpreempt_enable();
}

/* Sysctl knobs.  */
unsigned	uvm_pagealloc_refill_batch = 256;	/* XXX */
unsigned	uvm_pagealloc_refill_lowat = 128;	/* XXX */
unsigned	uvm_pagealloc_refill_hiwat = 4096;	/* XXX */

/* Global free page queue lock.  */
kmutex_t	uvm_fpageqlock;

/*
 * uvm_pagealloc_strat: Allocate a page to be owned by uobj or uanon at
 * offset off, using the specified strategy.
 */
struct vm_page *
uvm_pagealloc_strat(struct uvm_object *uobj, voff_t off, struct vm_anon *uanon,
    int flags, int strategy, int stratarg)
{
	struct uvm_cpu *ucpu;
	unsigned color;
	struct lwp *l;
	unsigned q0, q1;
	unsigned pgflno;
	struct vm_page *pg;
	bool need_zero;

	KASSERT((uobj == NULL) ^ (uanon == NULL));
	KASSERT(uanon == NULL || (flags & UVM_FLAG_COLORMATCH) || off == 0);
	KASSERT(off == trunc_page(off));
	KASSERT(uobj == NULL || mutex_owned(uobj->vmobjlock));

	/* Lock the CPU.  */
	ucpu = uvm_cpu_get();

	/* Choose a color.  */
	if (flags & UVM_FLAG_COLORMATCH)
		color = atop(off) & uvmexp.colormask;
	else
		color = ucpu->page_free_nextcolor;

	/*
	 * Allow kernel and realtime threads to use the kernel reserve
	 * (but not the pagedaemon reserve).
	 *
	 * XXX OOPS!  Nothing uses this flag just yet.  Where to put
	 * it?  In uvm_pagealloc_refill_cpu, probably.
	 */
	l = curlwp;
	if (__predict_true(l != NULL) && lwp_eprio(l) >= PRI_KTHREAD)
		flags |= UVM_PGA_USERESERVE;

	/* Decide whether to prefer zero'd pages or unknown pages.  */
	CTASSERT(PGFL_NQUEUES == 2);
	if (flags & UVM_PGA_ZERO) {
		q0 = PGFL_ZEROS;
		q1 = PGFL_UNKNOWN;
	} else {
		q0 = PGFL_UNKNOWN;
		q1 = PGFL_ZEROS;
	}

	/* Apply the page allocation strategy.  */
	switch (strategy) {
	case UVM_PGA_STRAT_ONLY:	/* must be on requested freelist */
	case UVM_PGA_STRAT_FALLBACK:	/* start with requested freelist */
		KASSERT(0 <= stratarg);
		KASSERT(stratarg < VM_NFREELIST);
		pgflno = stratarg;
		pg = uvm_pagealloc_pgfl(ucpu, pgflno, q0, q1, &color, flags);
		if (pg != NULL)
			break;
		if (strategy == UVM_PGA_STRAT_ONLY)
			break;
		/* FALLTHROUGH */
	case UVM_PGA_STRAT_NORMAL:	/* try all freelists */
		for (pgflno = 0; pgflno < VM_NFREELIST; pgflno++) {
			pg = uvm_pagealloc_pgfl(ucpu, pgflno, q0, q1, &color,
			    flags);
			if (pg != NULL)
				break;
		}
		break;
	default:
		panic("uvm_pagealloc_strat: invalid strategy: %d", strategy);
	}

	if (pg == NULL)
		goto out;
	KASSERT(pg->pqflags == PQ_FREE);

	/* Remember the color for the next allocation.  */
	ucpu->page_free_nextcolor = (color + 1) & uvmexp.colormask;

	/* Account zero hits/misses, and decide whether to zero.  */
	if (flags & UVM_PGA_ZERO) {
		if (pg->flags & PG_ZERO) {
			ucpu->stats.pga_zerohit++;
			need_zero = false;
		} else {
			ucpu->stats.pga_zeromiss++;
			need_zero = true;
		}
		if (ucpu->npages[PGFL_ZEROS] < ucpu->npages[PGFL_UNKNOWN])
			ucpu->page_idle_zero = vm_page_zero_enable;
	}

	/*
	 * Assign the page's identity:
	 * - owned by specified obj or anon at specified offset
	 * - busy: busied for caller
	 * - fake: not yet initialized
	 * - clean: caller must initialize before unbusying
	 */
	pg->offset = off;
	pg->uobject = obj;
	pg->uanon = anon;
	pg->flags = PG_BUSY|PG_CLEAN|PG_FAKE;
	pg->pqflags = 0;
	if (anon) {
		anon->an_page = pg;
		pg->pqflags |= PQ_ANON;
		ucpu->nanonpages++;
	} else if (obj) {
		uvm_pageinsert(obj, pg);
	}

	/* Set up ownership tracking.  */
#ifdef UVM_PAGE_TRKOWN
	pg->owner_tag = NULL;
#endif
	UVM_PAGE_OWN(pg, "new alloc");

	/* Zero the page if caller asked and we're not sure.  */
	if (flags & UVM_PGA_ZERO) {
		/* Zero'd pages are not clean.  (XXX Why not?)  */
		pg->flags &= ~PG_CLEAN;
		if (need_zero)
			pmap_zero_page(VM_PAGE_TO_PHYS(pg));
	}

out:	/* Unlock the CPU and return the page, if any.  */
	uvm_cpu_put(ucpu);
	return pg;
}

/*
 * uvm_pagealloc_pgfl: Allocate a page from the specified page
 * freelist, preferring queue q0 over queue q1, using *colorp exactly
 * if flags & UVM_FLAG_COLORMATCH or else trying *colorp first and
 * returning the actual color in *colorp.
 *
 * Try first from the local CPU's queues.  If no match is found,
 * refill it from the global queues
 */
static struct vm_page *
uvm_pagealloc_pgfl(struct uvm_cpu *ucpu, unsigned pgflno, unsigned q0,
    unsigned q1, unsigned *colorp, int flags)
{
	unsigned wantcolor = *colorp;
	struct vm_page *pg;

	KASSERT(kpreempt_disabled());

	/*
	 * Iterate over the colors outside the CPU/global alternation,
	 * if user wants colormatch.  Otherwise, we will try other
	 * colors in the local CPU queue first and in the global queues
	 * next.
	 */
	color = wantcolor;
	do {
		pg = uvm_pagealloc_pgfl_cpu(ucpu, pgflno, q0, q1, &color,
		    flags);
		if (__predict_true(pg != NULL))
			goto out;
		/*
		 * Couldn't find a matching page on the local CPU's
		 * queues.  Try to refill the local CPU's queues from
		 * the global queues.
		 */
		KASSERT(color == *colorp);
		uvm_pagealloc_refill_cpu(ucpu, pgflno, q0, q1, color, flags);
		pg = uvm_pagealloc_pgfl_cpu(ucpu, pgflno, q0, q1, &color,
		    flags);
		if (__predict_true(pg != NULL))
			goto out;
		if (!(flags & UVM_FLAG_COLORMATCH))
			break;
		color = (color + 1) & uvmexp.colormask;
	} while (color != wantcolor);

	/* Nada.  Give up.  */
	return NULL;

out:	/* Got a page.  */
	if (color == *colorp) {
		ucpu->stats.colorhit++;
	} else {
		ucpu->stats.colormiss++;
		*colorp = color;
	}

	return pg;
}

/*
 * uvm_pagealloc_pgfl_cpu: Try to grab a page from the pre-CPU freelist
 * pgfl, trying first queue q0 and then queue q1.  Use *colorp exactly
 * if flags & UVM_FLAG_COLORMATCH; otherwise start with *colorp but try
 * every other color too.
 */
static inline struct vm_page *
uvm_pagealloc_pgfl_cpu(struct uvm_cpu *ucpu, unsigned pgflno, unsigned q0,
    unsigned q1, unsigned *colorp, int flags)
{
	unsigned color, startcolor = *colorp;
	struct pgfreelist *const pgfreelist = &ucpu->page_free[pgflno];
	struct vm_page *pg = NULL;

	KASSERT(kpreempt_disabled());

	color = startcolor;
	do {
		pg = uvm_pagealloc_pgfl_cpu_qc(ucpu, pgfreelist, q0, color);
		if (pg != NULL)
			goto out;
		pg = uvm_pagealloc_pgfl_cpu_qc(ucpu, pgfreelist, q1, color);
		if (pg != NULL)
			goto out;
		color = (color + 1) & uvmexp.colormask;
	} while (color != startcolor && !(flags & UVM_FLAG_COLORMATCH));

	/* Nothing found.  */
	return NULL;

out:	*colorp = color;
	return pg;
}

/*
 * uvm_pagealloc_pgfl_cpu_qc: Try to grab a page with the specified
 * color from the queue q of the page freelist pgfreelist on the local
 * CPU.
 */
static inline struct vm_page *
uvm_pagealloc_pgfl_cpu_qc(struct uvm_cpu *ucpu, unsigned pgflno,
    unsigned q, unsigned color)
{
	struct pgflist *pgflist;
	struct vm_page *pg;

	KASSERT(kpreempt_disabled());

	pgflist = &ucpu->page_free[pgflno].pgfl_buckets[color].pgfl_queues[q];
	pg = LIST_FIRST(pgflist);
	if (pg == NULL)
		return NULL;
	LIST_REMOVE(pg, listq.list);

	/* Make sure the page looks sane.  */
	KASSERT(pg->pqflags & PQ_FREE);
	KASSERT(ucpu == VM_FREE_PAGE_TO_CPU(pg));

	/* Account for it.  */
	KASSERT(0 < ucpu->npages[q]);
	ucpu->npages[q]--;

	/*
	 * If we went below the low water mark, try to refill from the
	 * global queues so we'll be ready for .
	 */
	CTASSERT(PGFL_NQUEUES == 2);
	if (ucpu->npages[q] + ucpu->npages[1 - q] < uvm_pagealloc_refill.lowat)
		uvm_pagealloc_refill_cpu(ucpu, pgflno, q, 1 - q, color, flags);

	/* Account for zero pages.  */
	if (pg->flags & PG_ZERO) {
		KASSERT(q == PGFL_ZEROS);
		ucpu->stats.nzeropages--;
	} else {
		KASSERT(q == PGFL_UNKNOWN);
	}

	return pg;
}

/*
 * uvm_pagealloc_refill_cpu: Try to refill the local CPU's queues from
 * the global queues.  Prefer queue q0, then try q1.
 *
 * XXX OOPS!  Does it make sense to grab pages for this color?  Seems
 * we ought to cycle through the colors.
 */
static void
uvm_pagealloc_refill_cpu(struct uvm_cpu *ucpu, unsigned pgflno, unsigned q0,
    unsigned q1, unsigned color, int flags)
{
	struct pgflbucket *cpu_bucket, *global_bucket;
	size_t nreq;
	struct vm_page *pg;

	KASSERT(kpreempt_disabled());

	cpu_bucket = &ucpu->page_free[pgflno].pgfl_buckets[color];
	CTASSERT(PGFL_NQUEUES == 2);
	KASSERT(LIST_EMPTY(bucket->pgfl_queues[q0]));
	KASSERT(LIST_EMPTY(bucket->pgfl_queues[q1]));

	mutex_enter(&uvm_fpageqlock);
	global_bucket = &uvm.page_free[pgflno].pgfl_buckets[color];
	nreq = uvm_pagealloc_refill.batch;
	KASSERT(0 < nreq);

	/* First try to refill from the preferred queue.  */
	do {
		if (!uvm_pagealloc_refill_1(cpu_bucket, global_bucket, q0))
			break;
	} while (nreq--);

	/* If that's not enough, try to refill from the other queue.  */
	if (nreq) {
		do {
			if (!uvm_pagealloc_refill_1(cpu_bucket, global_bucket,
				q1))
				break;
		} while (nreq--);
	}

	/* If that's still not enough, kick the pagedaemon.  */
	if (nreq)
		uvm_kick_pdaemon();
	mutex_exit(&uvm_fpageqlock);
}

static bool
uvm_pagealloc_refill_1(struct pgfl_bucket *cpu_bucket,
    struct pgfl_bucket *global_bucket, unsigned q)
{
	struct vm_page *pg;

	KASSERT(mutex_owned(&uvm_fpageqlock));

	pg = LIST_FIRST(global_bucket->pgfl_queues[q]);
	if (pg == NULL)
		return false;
	LIST_REMOVE(pg, listq.list);
	uvm.npages[q]--;
	LIST_INSERT_HEAD(cpu_bucket, cpu_bucket->pgfl_queues[q]);
	ucpu->npages[q]++;
}

void
uvm_pagefree(struct vm_page *pg)
{
	struct uvm_cpu *ucpu;
	unsigned pgflno;
	unsigned q;
	unsigned color;
	bool is_zero;

	XXX destroy loan;

	/* Lock the CPU.  */
	ucpu = uvm_cpu_get();

	/* Disassociate the page from its identity.  */
	if (pg->uobject != NULL) {
		KASSERT(pg->uanon == NULL); /* Can't be loaned.  */
		uvm_pageremove(pg->uobject, pg);
	} else if (pg->uanon != NULL) {
		pg->uanon->an_page = NULL;
		ucpu->nanonpages--;
	}

#if DEBUG
	/* Catch anyone trying to use its owner after this point.  */
	pg->uobject = (void *)0xdeadbeef;
	pg->uanon = (void *)0xdeadbeef;
#endif

	XXX pagedaemon pageq bookkeeeping; // Argh!  Global locks!
	XXX unwire;

	/* Find the free page queue to put it on. */
	is_zero = (pg->flags & PG_ZERO);
	pgflno = uvm_page_lookup_freelist(pg);
	color = VM_PGCOLOR_BUCKET(pg);
	q = is_zero? PGFL_ZEROS : PGFL_UNKNOWN;

#if DEBUG
	/* Make sure it's zero if it claims to be.  */
	if (is_zero)
		uvm_pagezerocheck(pg);
#endif

	/*
	 * Put it on the free queue: per-CPU if there aren't too many,
	 * global if we have a lot of free pages on this queue.
	 */
	pg->pqflags = PQ_FREE;
	if (ucpu->npages[q] < uvm_pagealloc_refill_hiwat) {
		pgfreelist = &ucpu->page_free[pgflno];
		pgflist = &pgfreelist.pgfl_buckets[color].pgfl_queues[q];
		LIST_INSERT_HEAD(pgflist, pg, pageq.list);
		ucpu->npages[q]++;
		if (is_zero)
			ucpu->stats.nzeropages++;
	} else {
		mutex_spin_enter(&uvm_fpageqlock);
		pgfreelist = &uvm.page_free[pgflno];
		pgflist = &pgfreelist.pgfl_buckets[color].pgfl_queues[q];
		LIST_INSERT_HEAD(pgflist, pg, pageq.list);
		uvm.npages[q]++;
		if (is_zero)
			uvm.zeropages++;
		mutex_spin_exit(&uvm_fpageqlock);
	}

	/* All done.  Unlock the CPU.  */
	uvm_cpu_put(ucpu);
}

References:
- Prospective project for Summer of Code.
  - From: Raunaq Kochar

Prev by Date: Re: Restricting rdtsc [was: kernel aslr]
Next by Date: Re: Restricting rdtsc [was: kernel aslr]
Previous by Thread: Re: Prospective project for Summer of Code.
Next by Thread: Fwd: Prospective project for Summer of Code.
Indexes:

Home | Main Index | Thread Index | Old Index