Subject: Re: free page management
To: Chris G. Demetriou <cgd@cs.cmu.edu>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 03/12/1997 12:52:16
On Mon, 10 Mar 1997 19:39:52 -0500 
 "Chris G. Demetriou" <cgd@cs.cmu.edu> wrote:

 > So, what you're suggesting is to add overhead to the (very) common
 > case for what's hoped to be infrequent operations on a
 > hopefully-decreasing set of architectures?

Good point :-)

 > I'm also concerned that since the implementation of the extent manager
 > involves potentially long linear linked list searches, each extent
 > manager call could be ... painfully slow, adding to the overall added
 > inefficiency.

...yah, I had this concern, too.  But, on the other hand, I was hoping
to be able to use the extent code's allocators.  Sigh.

 > You've obviously implemented this, have you run any tests to see how
 > this performs in various conditions (e.g. compile on small-memory
 > machine, compile on medium-memory machine, and compile in large-memory
 > machine)?

I had not yet run it on a small-memory machine, but am running it on
large-memory and medium-memory machines (128M P6/200, 32M DEC 3000/400).
I had planned on building profiling versions of those kernels and measuring
the extra cost incurred by the consistency code.  I agree that the
cost would increase over time, since the free list tends to have pages
toward the end of physical memory at the head of the free list after
a number of faults occur.  Whe pages are freed back, they go onto the
end of the free list.  There comes a point where you tend to be using
pages towards the end of physical memory, so the extent map lookups
would get long, and then fluxuate, depending on which pages were
being recycled.

 > Obviously, optimization is premature, but i'm concerned about the
 > possibility of expensive, repetitive lookups (e.g. you pull the thing
 > off the head of the free list, and have to do a long linear lookup to
 > find it in the extent map, you pull the next thing off the free list
 > and have to do the same thing).

So, as did a couple of other folks, I began to feel less happy about
using the extent code in this case.

 > I'd say that what you really want might be a #-of-managed-pages-long
 > bitmap, which marks pages as in-use or not.  with something like that,
 > you can easily find available regions, mark pages in-use or
 > not-in-use, and translate from "available page" to vm_page structure
 > and vice-versa...

Yah, I agree with this, and have implmented it.  It definitely makes
the code simpler, and it makes consistency in the other direction
much simpler/faster, as well (the whole thing is based on VM_PAGE_INDEX(),
which is the index in the vm_page_array[] of a PA's vm_page structure).

Anyhow, the diffs are attached below.  Please sanity-check them and
comment :-)

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                               Home: 408.866.1912
NAS: M/S 258-6                                          Work: 415.604.0935
Moffett Field, CA 94035                                Pager: 415.428.6939

Index: vm_page.c
===================================================================
RCS file: /mastersrc/netbsd/src/sys/vm/vm_page.c,v
retrieving revision 1.1.1.2
retrieving revision 1.5
diff -c -r1.1.1.2 -r1.5
*** vm_page.c	1997/01/07 01:23:23	1.1.1.2
--- vm_page.c	1997/03/12 19:34:55	1.5
***************
*** 1,6 ****
--- 1,7 ----
  /*	$NetBSD: vm_page.c,v 1.30 1997/01/03 18:03:33 mrg Exp $	*/
  
  /* 
+  * Copyright (c) 1997 Jason R. Thorpe.  All rights reserved.
   * Copyright (c) 1991, 1993
   *	The Regents of the University of California.  All rights reserved.
   *
***************
*** 105,110 ****
--- 106,135 ----
  simple_lock_data_t	vm_page_queue_lock;
  simple_lock_data_t	vm_page_queue_free_lock;
  
+ /*
+  *	This bitmap is used to track allocated pages, and is provided
+  *	for use by flexible physical memory allocators.
+  */
+ 
+ int		*vm_page_bitmap;
+ vm_size_t	vm_page_bitmap_size;
+ 
+ #define	NVMPGBITS	(sizeof(int) * NBBY)
+ 
+ #define	VM_PAGE_SET_BIT(page_index)				\
+ 	(vm_page_bitmap[(page_index)/NVMPGBITS] |=		\
+ 	    (1 << ((page_index) % NVMPGBITS)))
+ 
+ #define	VM_PAGE_CLEAR_BIT(page_index)				\
+ 	(vm_page_bitmap[(page_index)/NVMPGBITS] &= 		\
+ 	    ~(1 << ((page_index) % NVMPGBITS)))
+ 
+ #define	VM_PAGE_IS_ALLOCATED(page_index)			\
+ 	(vm_page_bitmap[(page_index)/NVMPGBITS] &		\
+ 	    (1 << ((page_index) % NVMPGBITS)))
+ 
+ void	vm_page_init_bitmap __P((void));
+ 
  /* has physical page allocation been initialized? */
  boolean_t vm_page_startup_initialized;
  
***************
*** 144,149 ****
--- 169,194 ----
  			break;
  }
  
+ /*
+  *	vm_page_init_bitmap:
+  *
+  *	Initializes the managed page bitmap.
+  */
+ void
+ vm_page_init_bitmap()
+ {
+ 	int i, count;
+ 
+ 	/*
+ 	 *	Mark all pages allocated.  As the free list
+ 	 *	is initialized, the free pages will be cleared
+ 	 *	in the map.
+ 	 */
+ 
+ 	count = vm_page_bitmap_size / sizeof(int);
+ 	for (i = 0; i < count; i++)
+ 		vm_page_bitmap[i] = (u_int)~0;
+ }
  
  #ifdef	MACHINE_NONCONTIG
  /*
***************
*** 359,399 ****
  	 *	of a page structure per page).
  	 */
  
! 	cnt.v_free_count = npages = (*end - *start + sizeof(struct vm_page))
! 		/ (PAGE_SIZE + sizeof(struct vm_page));
  
  	/*
! 	 *	Record the extent of physical memory that the
! 	 *	virtual memory system manages.
  	 */
  
! 	first_page = *start;
! 	first_page += npages*sizeof(struct vm_page);
! 	first_page = atop(round_page(first_page));
! 	last_page  = first_page + npages - 1;
  
! 	first_phys_addr = ptoa(first_page);
! 	last_phys_addr  = ptoa(last_page) + PAGE_MASK;
  
  
  	/*
! 	 *	Allocate and clear the mem entry structures.
  	 */
  
! 	m = vm_page_array = (vm_page_t)
! 		pmap_bootstrap_alloc(npages * sizeof(struct vm_page));
  
  	/*
  	 *	Initialize the mem entry structures now, and
  	 *	put them in the free queue.
  	 */
  
! 	pa = first_phys_addr;
! 	while (npages--) {
  		m->flags = 0;
  		m->object = NULL;
  		m->phys_addr = pa;
  		TAILQ_INSERT_TAIL(&vm_page_queue_free, m, pageq);
  		m++;
  		pa += PAGE_SIZE;
  	}
--- 404,455 ----
  	 *	of a page structure per page).
  	 */
  
! 	npages = (*end - *start + sizeof(struct vm_page)) /
! 	    (PAGE_SIZE + sizeof(struct vm_page));
  
  	/*
! 	 *	Allocate and clear the mem entry structures.
  	 */
  
! 	m = vm_page_array = (vm_page_t)
! 		pmap_bootstrap_alloc(npages * sizeof(struct vm_page));
  
! 	/*
! 	 *	Create and initialize the managed page bitmap.
! 	 */
  
+ 	vm_page_bitmap_size = howmany(npages, NVMPGBITS) * sizeof(int);
+ 	vm_page_bitmap = (int *) pmap_bootstrap_alloc(vm_page_bitmap_size);
+ 	vm_page_init_bitmap();
  
  	/*
! 	 *	Now that we've stolen all of the neccesary pages
! 	 *	used for overhead, record the extent of physical
! 	 *	memory that the virtual memory system manages.
! 	 *
! 	 *	XXX Possible waste in last_page computation because
! 	 *	XXX of how last_phys_addr is computed.
  	 */
  
! 	first_page = atop(round_page(*start));
! 	last_page  = atop(trunc_page(*end)) - 1;
! 
! 	first_phys_addr = ptoa(first_page);
! 	last_phys_addr  = ptoa(last_page) + PAGE_MASK;
  
  	/*
  	 *	Initialize the mem entry structures now, and
  	 *	put them in the free queue.
  	 */
  
! 	for (pa = first_phys_addr, cnt.v_free_count = 0;
! 	    pa < last_phys_addr; ) {
  		m->flags = 0;
  		m->object = NULL;
  		m->phys_addr = pa;
  		TAILQ_INSERT_TAIL(&vm_page_queue_free, m, pageq);
+ 		VM_PAGE_CLEAR_BIT(VM_PAGE_INDEX(pa));
+ 		cnt.v_free_count++;
  		m++;
  		pa += PAGE_SIZE;
  	}
***************
*** 510,515 ****
--- 566,578 ----
  	vm_page_array = (vm_page_t)
  		pmap_steal_memory(vm_page_count * sizeof(*vm_page_array));
  
+ 	/*
+ 	 * Allocate and initialize the managed page bitmap.
+ 	 */
+ 	vm_page_bitmap_size = howmany(vm_page_count, NVMPGBITS) * sizeof(int);
+ 	vm_page_bitmap = (int *) pmap_steal_memory(vm_page_bitmap_size);
+ 	vm_page_init_bitmap();
+ 
  #ifdef	DIAGNOSTIC
  	/*
  	 * Initialize everyting in case the holes are stepped in,
***************
*** 765,770 ****
--- 828,835 ----
  	mem = vm_page_queue_free.tqh_first;
  	TAILQ_REMOVE(&vm_page_queue_free, mem, pageq);
  
+ 	VM_PAGE_SET_BIT(VM_PAGE_INDEX(VM_PAGE_TO_PHYS(mem)));
+ 
  	cnt.v_free_count--;
  	simple_unlock(&vm_page_queue_free_lock);
  	splx(spl);
***************
*** 821,826 ****
--- 886,893 ----
  		spl = splimp();
  		simple_lock(&vm_page_queue_free_lock);
  		TAILQ_INSERT_TAIL(&vm_page_queue_free, mem, pageq);
+ 
+ 		VM_PAGE_CLEAR_BIT(VM_PAGE_INDEX(VM_PAGE_TO_PHYS(mem)));
  
  		cnt.v_free_count++;
  		simple_unlock(&vm_page_queue_free_lock);
Index: vm_page.h
===================================================================
RCS file: /mastersrc/netbsd/src/sys/vm/vm_page.h,v
retrieving revision 1.1.1.2
retrieving revision 1.3
diff -c -r1.1.1.2 -r1.3
*** vm_page.h	1997/01/07 01:23:23	1.1.1.2
--- vm_page.h	1997/03/12 20:42:09	1.3
***************
*** 219,233 ****
  #define IS_VM_PHYSADDR(pa) \
  		((pa) >= first_phys_addr && (pa) <= last_phys_addr)
  
! #define PHYS_TO_VM_PAGE(pa) \
! 		(&vm_page_array[atop(pa) - first_page ])
  #else
  #define IS_VM_PHYSADDR(pa) \
  		(pmap_page_index(pa) >= 0)
  
! #define PHYS_TO_VM_PAGE(pa) \
! 		(&vm_page_array[pmap_page_index(pa) - first_page])
  #endif /* MACHINE_NONCONTIG */
  
  extern
  simple_lock_data_t	vm_page_queue_lock;	/* lock on active and inactive
--- 219,239 ----
  #define IS_VM_PHYSADDR(pa) \
  		((pa) >= first_phys_addr && (pa) <= last_phys_addr)
  
! #define	VM_PAGE_INDEX(pa) \
! 		(atop((pa)) - first_page)
  #else
  #define IS_VM_PHYSADDR(pa) \
  		(pmap_page_index(pa) >= 0)
  
! #define	VM_PAGE_INDEX(pa) \
! 		(pmap_page_index((pa)) - first_page)
  #endif /* MACHINE_NONCONTIG */
+ 
+ #define	INDEX_TO_VM_PAGE(idx) \
+ 		(&vm_page_array[(idx)])
+ 
+ #define	PHYS_TO_VM_PAGE(pa) \
+ 		INDEX_TO_VM_PAGE(VM_PAGE_INDEX((pa)))
  
  extern
  simple_lock_data_t	vm_page_queue_lock;	/* lock on active and inactive