Subject: Re: Increasing VM limits
To: Frank van der Linden <fvdl@netbsd.org>
From: Wolfgang Solfrank <ws@tools.de>
List: port-amd64
Date: 02/14/2005 02:59:21
This is a multi-part message in MIME format.
--------------020708070003080106050809
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

> Speaking of layout.. have you tried defining __USE_TOPDOWN_VM in
> amd64/include/vmparam.h and checking its effect? Do you still get
> the sbrk kernel messages with that enabled?

So I did try this during the weekend.  Unfortunately, no change :-(.

Now further investigation revealed the following:

1. Our current userland malloc code keeps track of memory usage on
a page per page basis and thus allocates one pointer per page, i.e.
8 bytes per page of memory allocated.  On amd64 this amounts to one
additional page per (4k/8=)512 pages malloced (see lib/libc/stdlib/malloc.c,
extend_pgdir).

2. The kernel similarly keeps track of anonymous memory on a page per
page basis in amaps.  It uses sizeof(int)+sizeof(int)+sizeof(int)+
sizeof(struct vm_anon *) (i.e. 20 bytes on LP64 machines) for this
(see src/src/sys/uvm/uvm_amap.c, amap_extend, esp. case 3).

This is in addition to the machine dependent space required for things
like page tables and other pmap related things.

On large user level allocations, this results in the kernel running out
of virtual memory.  The limits for kmem pages, esp. NKMEMPAGES_MAX_DEFAULT,
which on amd64 allows only for 128 MB of kernel virtual memory, are too
low for this.  Increasing this to, say, 1 GB reveals another problem:

The pmap module needs to allocate memory for the pv structures to track
the mappings of physical pages.  Now the startup code (in
sys/kern/kern_malloc.c, kmeminit) tries to allocate memory to track
kmemusage again on a page per page basis, before setting up the kmem_map
that's normally used for those allocations.  In order to support this
early allocation, the pmap module primes the relevant pool with just one
page worth of those structures.  This is only enough to track about 400 MB
of kernel virtual memory (assuming for the moment, there'd be no other
early allocation).  In order to support the above mentioned 1 GB of kernel
virtual memory, I had to increase this priming to some additional pages.

All in all, I'm not happy with this tracking of memory usage in different
places on a page per page basis, especially on machines with this
large address spaces vs. relatively small page sizes.  Quite a lot of
memory is lost for this housekeeping.  There should be some special
mechanism for large uniformly mapped memory areas in order to reduce
this.

Ciao,
Wolfgang

PS: Attached please find the diffs I'm currently running.  With this
(and of course an increased MAXDSIZ) I'm now able to malloc up to about
120 GB of memory in a single program.  Note that on the same box, Linux
is able to malloc something like 500 GB in a similar case.
-- 
ws@TooLs.DE                            Wolfgang Solfrank, TooLs GmbH

--------------020708070003080106050809
Content-Type: text/plain;
 name="diffs"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="diffs"

Index: amd64/pmap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/amd64/amd64/pmap.c,v
retrieving revision 1.15
diff -u -r1.15 pmap.c
--- amd64/pmap.c	1 Jan 2005 21:00:06 -0000	1.15
+++ amd64/pmap.c	14 Feb 2005 01:48:33 -0000
@@ -429,7 +429,8 @@
 static struct pv_pagelist pv_freepages;	/* list of pv_pages with free entrys */
 static struct pv_pagelist pv_unusedpgs; /* list of unused pv_pages */
 static int pv_nfpvents;			/* # of free pv entries */
-static struct pv_page *pv_initpage;	/* bootstrap page from kernel_map */
+#define	NINITPGS	4
+static struct pv_page *pv_initpage[NINITPGS];	/* bootstrap pages from kernel_map */
 static vaddr_t pv_cachedva;		/* cached VA for later use */
 
 #define PVE_LOWAT (PVE_PER_PVPAGE / 2)	/* free pv_entry low water mark */
@@ -1312,12 +1313,15 @@
 	 * structures.   we never free this page.
 	 */
 
-	pv_initpage = (struct pv_page *) uvm_km_alloc(kernel_map, PAGE_SIZE);
-	if (pv_initpage == NULL)
-		panic("pmap_init: pv_initpage");
 	pv_cachedva = 0;   /* a VA we have allocated but not used yet */
 	pv_nfpvents = 0;
-	(void) pmap_add_pvpage(pv_initpage, FALSE);
+	for (i = 0; i < NINITPGS; i++) {
+		pv_initpage[i] = (struct pv_page *) uvm_km_alloc(kernel_map,
+				PAGE_SIZE);
+		if (pv_initpage[i] == NULL)
+			panic("pmap_init: pv_initpage");
+		(void) pmap_add_pvpage(pv_initpage[i], FALSE);
+	}
 
 	pj_page = (void *)uvm_km_alloc(kernel_map, PAGE_SIZE);
 	if (pj_page == NULL)
@@ -1626,7 +1630,7 @@
 static void
 pmap_free_pvpage()
 {
-	int s;
+	int s, initpg;
 	struct vm_map *map;
 	struct vm_map_entry *dead_entries;
 	struct pv_page *pvp;
@@ -1640,10 +1644,12 @@
 	 * kernel_map rather than kmem_map.
 	 */
 
-	if (pvp == pv_initpage)
-		map = kernel_map;
-	else
-		map = kmem_map;
+	map = kmem_map;
+	for (initpg = 0; initpg < NINITPGS; initpg++)
+		if (pvp == pv_initpage[initpg]) {
+			map = kernel_map;
+			break;
+		}
 	if (vm_map_lock_try(map)) {
 
 		/* remove pvp from pv_unusedpgs */
@@ -1660,9 +1666,9 @@
 
 		pv_nfpvents -= PVE_PER_PVPAGE;  /* update free count */
 	}
-	if (pvp == pv_initpage)
+	if (initpg < NINITPGS)
 		/* no more initpage, we've freed it */
-		pv_initpage = NULL;
+		pv_initpage[initpg] = NULL;
 
 	splx(s);
 }
Index: include/param.h
===================================================================
RCS file: /cvsroot/src/sys/arch/amd64/include/param.h,v
retrieving revision 1.1
diff -u -r1.1 param.h
--- include/param.h	26 Apr 2003 18:39:45 -0000	1.1
+++ include/param.h	14 Feb 2005 01:48:33 -0000
@@ -107,7 +107,7 @@
  * logical pages.
  */
 #define	NKMEMPAGES_MIN_DEFAULT	((8 * 1024 * 1024) >> PAGE_SHIFT)
-#define	NKMEMPAGES_MAX_DEFAULT	((128 * 1024 * 1024) >> PAGE_SHIFT)
+#define	NKMEMPAGES_MAX_DEFAULT	((1 * 1024 * 1024 * 1024) >> PAGE_SHIFT)
 
 /* pages ("clicks") to disk blocks */
 #define	ctod(x)		((x) << (PGSHIFT - DEV_BSHIFT))
Index: include/vmparam.h
===================================================================
RCS file: /cvsroot/src/sys/arch/amd64/include/vmparam.h,v
retrieving revision 1.7
diff -u -r1.7 vmparam.h
--- include/vmparam.h	11 Feb 2005 11:01:10 -0000	1.7
+++ include/vmparam.h	14 Feb 2005 01:48:34 -0000
@@ -105,6 +105,8 @@
 
 #define VM_MAXUSER_ADDRESS32	0xfffff000
 
+#define	__USE_TOPDOWN_VM
+
 /*
  * XXXfvdl we have plenty of KVM now, remove this.
  */

--------------020708070003080106050809--