Subject: free page management
To: None <tech-kern@NetBSD.ORG>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 03/09/1997 14:32:51
Hi folks... before I go out to hack on my bike, I figured I'd throw
this out to the wolves (the theory being that I'd have lots of comments
in my mailbox by the time I get back :-)

One of the things I need for the MI DMA mapping code is a way to allocate
regions of physical memory with special constraints.  For example, I
need to be able to say, in a machine-independent way (since a few ports
will need to do this):

    Give me a 16k contiguous chunk, aligned to 8k, don't cross 64k line,
    and make sure it lies between 0 -> 16M.

Now, it just so happens that the extent code provides this (big surprise! :-)

However, the way free pages are managed by the VM system isn't terribly
conducive to flexible allocation policies.  It's designed to grab a page,
quickly, from a free list, which is currently fine for 100% of the cases of
post-bootstrap dynamic memory allocation in the kernel (there are some
special hacks for dynamic memory allocation before the VM system is
bootstrapped, but the pages those allocators allocate are not managed).

So, the obvious right thing to do here is to use an extent map to track
free/allocated managed pages.  However, there's a problem with this... the
vm_page structures on the free/active/inactive lists contain a great deal
more information than just address of the page, such as the associated
object, dirty bits, etc.

So, what I'm proposing to do is to use both; an extent to be used by
flexible allocators, and the free list to be used by the rest of the
system (i.e. fault handlers, kmem allocators, etc.)  Consistency between
the extent map and the free list will be enforced by the functions that
directly manipulate the free list and the extent map.

Attached below are diffs that implement the extent map and consistency
with the current physical memory allocator (vm_page_alloc(), vm_page_free()).
I haven't yet implemented the Way Cool Allocator that I need for the
DMA code... I want to get some sanity checking on this step before I
do too much work on it :-)

Constructive feedback is appreciated... Please pay special attention
to the "XXX" comments I added... those are spots where I need a little
extra sanity checking.  This code is running on a few of my machines, and
seems to be working fine... I plan on building a profiling version of
kernels with and without the consistency stuff activiated, to measure
just how expensive all of this is.

Ciao.

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                               Home: 408.866.1912
NAS: M/S 258-6                                          Work: 415.604.0935
Moffett Field, CA 94035                                Pager: 415.428.6939

Index: vm_page.c
===================================================================
RCS file: /mastersrc/netbsd/src/sys/vm/vm_page.c,v
retrieving revision 1.1.1.2
retrieving revision 1.2
diff -c -r1.1.1.2 -r1.2
*** vm_page.c	1997/01/07 01:23:23	1.1.1.2
--- vm_page.c	1997/03/08 20:13:55	1.2
***************
*** 1,6 ****
--- 1,7 ----
  /*	$NetBSD: vm_page.c,v 1.30 1997/01/03 18:03:33 mrg Exp $	*/
  
  /* 
+  * Copyright (c) 1997 Jason R. Thorpe.  All rights reserved.
   * Copyright (c) 1991, 1993
   *	The Regents of the University of California.  All rights reserved.
   *
***************
*** 71,76 ****
--- 72,79 ----
  #include <sys/param.h>
  #include <sys/systm.h>
  #include <sys/proc.h>
+ #include <sys/malloc.h>
+ #include <sys/extent.h>
  
  #include <vm/vm.h>
  #include <vm/vm_page.h>
***************
*** 105,110 ****
--- 108,133 ----
  simple_lock_data_t	vm_page_queue_lock;
  simple_lock_data_t	vm_page_queue_free_lock;
  
+ /*
+  *	This extent map is provided for more flexible physical
+  *	memory allocators.  Consistency with the free list
+  *	is maintained.
+  *
+  *	We need "managed page count" region descriptors, plus
+  *	one, since we allocate the entire number range of the
+  *	extent, then free chunks as we add pages to the free list.
+  *
+  *	XXX Hmm, is it really "(managed page count / 2) + 1"?
+  *
+  *	XXX We have to add region descriptors to the fixed extent
+  *	XXX manually because of the way the page count is computed.
+  */
+ 
+ static long vm_page_ex_storage[EXTENT_FIXED_STORAGE_SIZE(1) / sizeof(long)];
+ struct extent	*vm_page_ex;
+ 
+ void	vm_page_init_extent __P((vm_size_t, vm_offset_t));
+ 
  /* has physical page allocation been initialized? */
  boolean_t vm_page_startup_initialized;
  
***************
*** 144,149 ****
--- 167,219 ----
  			break;
  }
  
+ /*
+  *	vm_page_init_extent:
+  *
+  *	Creates and initializes the managed memory extent map.
+  */
+ void
+ vm_page_init_extent(npages, region_storage)
+ 	vm_size_t	npages;
+ 	vm_offset_t	region_storage;
+ {
+ 	struct extent_fixed	*fex;
+ 	struct extent_region	*rp;
+ 
+ 	/*
+ 	 *	Create the extent map to manage the entire number
+ 	 *	range.  Regions that are later freed represent
+ 	 *	free pages in the system.
+ 	 */
+ 
+ 	vm_page_ex = extent_create("vm_page", (u_long)0, (u_long)~0, M_VMMAP,
+ 	    (caddr_t)vm_page_ex_storage, sizeof(vm_page_ex_storage),
+ 	    EX_NOWAIT);
+ 	fex = (struct extent_fixed *)vm_page_ex;
+ 
+ 	/*
+ 	 *	Add our specially-allocated region descriptors
+ 	 *	to the extent map's free list.
+ 	 */
+ 
+ 	bzero((void *)region_storage,
+ 	    npages * ALIGN(sizeof(struct extent_region)));
+ 	
+ 	while (npages--) {
+ 		rp = (struct extent_region *)region_storage;
+ 		region_storage += ALIGN(sizeof(struct extent_region));
+ 		LIST_INSERT_HEAD(&fex->fex_freelist, rp, er_link);
+ 	}
+ 
+ 	/*
+ 	 *	Allocate the entire number range of the extent.
+ 	 *	This allows us to free regions as we add pages
+ 	 *	to the free list, resulting in a description of
+ 	 *	the free memory regions.
+ 	 */
+ 	if (extent_alloc_region(vm_page_ex, (u_long)0, (u_long)~0, EX_NOWAIT))
+ 		panic("vm_page_init_extent: can't allocate number range");
+ }
  
  #ifdef	MACHINE_NONCONTIG
  /*
***************
*** 277,282 ****
--- 347,353 ----
  	register struct pglist	*bucket;
  	vm_size_t		npages;
  	int			i;
+ 	vm_offset_t		ex_regions;
  	vm_offset_t		pa;
  	extern	vm_offset_t	kentry_data;
  	extern	vm_size_t	kentry_data_size;
***************
*** 356,366 ****
  	/*
   	 *	Compute the number of pages of memory that will be
  	 *	available for use (taking into account the overhead
! 	 *	of a page structure per page).
  	 */
  
! 	cnt.v_free_count = npages = (*end - *start + sizeof(struct vm_page))
! 		/ (PAGE_SIZE + sizeof(struct vm_page));
  
  	/*
  	 *	Record the extent of physical memory that the
--- 427,441 ----
  	/*
   	 *	Compute the number of pages of memory that will be
  	 *	available for use (taking into account the overhead
! 	 *	of a page structure and extent region per page).
  	 */
  
! 	npages = (*end - *start + sizeof(struct vm_page) +
! 	    ALIGN(sizeof(struct extent_region))) /
! 	    (PAGE_SIZE + sizeof(struct vm_page) +
! 	    ALIGN(sizeof(struct extent_region)));
! 
! 	cnt.v_free_count = npages;
  
  	/*
  	 *	Record the extent of physical memory that the
***************
*** 369,374 ****
--- 444,450 ----
  
  	first_page = *start;
  	first_page += npages*sizeof(struct vm_page);
+ 	first_page += npages*ALIGN(sizeof(struct extent_region));
  	first_page = atop(round_page(first_page));
  	last_page  = first_page + npages - 1;
  
***************
*** 384,389 ****
--- 460,473 ----
  		pmap_bootstrap_alloc(npages * sizeof(struct vm_page));
  
  	/*
+ 	 *	Create and initialize the managed region extent map.
+ 	 */
+ 
+ 	ex_regions = (vm_offset_t)
+ 	    pmap_bootstrap_alloc(npages * ALIGN(sizeof struct extent_region));
+ 	vm_page_init_extent(npages, ex_regions);
+ 
+ 	/*
  	 *	Initialize the mem entry structures now, and
  	 *	put them in the free queue.
  	 */
***************
*** 395,400 ****
--- 479,488 ----
  		m->phys_addr = pa;
  		TAILQ_INSERT_TAIL(&vm_page_queue_free, m, pageq);
  		m++;
+ 		if (extent_free(vm_page_ex, pa, PAGE_SIZE, EX_NOWAIT)) {
+ 			printf("can't free page at 0x%lx\n", pa);
+ 			panic("vm_page_startup");
+ 		}
  		pa += PAGE_SIZE;
  	}
  
***************
*** 483,488 ****
--- 571,577 ----
  	vm_offset_t	*endp;
  {
  	unsigned int	i, freepages;
+ 	vm_offset_t	ex_regions;
  	vm_offset_t	paddr;
  	
  	/*
***************
*** 505,515 ****
  	freepages += 1;	/* fudge */
  
  	vm_page_count = (PAGE_SIZE * freepages) /
! 		(PAGE_SIZE + sizeof(*vm_page_array));
  
  	vm_page_array = (vm_page_t)
  		pmap_steal_memory(vm_page_count * sizeof(*vm_page_array));
  
  #ifdef	DIAGNOSTIC
  	/*
  	 * Initialize everyting in case the holes are stepped in,
--- 594,610 ----
  	freepages += 1;	/* fudge */
  
  	vm_page_count = (PAGE_SIZE * freepages) /
! 		(PAGE_SIZE + sizeof(*vm_page_array) +
! 		ALIGN(sizeof(struct extent_region)));
  
  	vm_page_array = (vm_page_t)
  		pmap_steal_memory(vm_page_count * sizeof(*vm_page_array));
  
+ 	ex_regions = (vm_offset_t)
+ 		pmap_steal_memory(vm_page_count *
+ 		  ALIGN(sizeof(struct extent_region)));
+ 	vm_page_init_extent(vm_page_count, ex_regions);
+ 
  #ifdef	DIAGNOSTIC
  	/*
  	 * Initialize everyting in case the holes are stepped in,
***************
*** 765,770 ****
--- 860,872 ----
  	mem = vm_page_queue_free.tqh_first;
  	TAILQ_REMOVE(&vm_page_queue_free, mem, pageq);
  
+ 	if (extent_alloc_region(vm_page_ex, mem->phys_addr,
+ 	    PAGE_SIZE, EX_NOWAIT)) {
+ 		printf("can't allocate page at 0x%lx\n",
+ 		    mem->phys_addr);
+ 		panic("vm_page_alloc");
+ 	}
+ 
  	cnt.v_free_count--;
  	simple_unlock(&vm_page_queue_free_lock);
  	splx(spl);
***************
*** 821,826 ****
--- 923,935 ----
  		spl = splimp();
  		simple_lock(&vm_page_queue_free_lock);
  		TAILQ_INSERT_TAIL(&vm_page_queue_free, mem, pageq);
+ 
+ 		if (extent_free(vm_page_ex, mem->phys_addr,
+ 		    PAGE_SIZE, EX_NOWAIT)) {
+ 			printf("can't free page at 0x%lx\n",
+ 			    mem->phys_addr);
+ 			panic("vm_page_free");
+ 		}
  
  		cnt.v_free_count++;
  		simple_unlock(&vm_page_queue_free_lock);