Subject: Re: -current malloc(3) is slow
To: Charles M. Hannum <root@ihack.net>
From: Simon Burge <simonb@netbsd.org>
List: tech-userlevel
Date: 09/13/1999 23:45:44
"Charles M. Hannum" wrote:

> So, there are several ways in which the `new' malloc(3) is quite slow:
> 
> * realloc(3) ~always copies; it doesn't know how to extend or shrink a
>   region (except in very special cases, like going from NBPG+1 to
>   NBPG*2).
> 
>   This should be trivially solvable by just freeing the tail end or
>   checking whether the pages after the object are available.

I haven't looked much at this yet, other than our old malloc seems to
have a similar problem.

> * realloc(3) and free(3) do a bunch of gratuitous O(n) (where `n' is
>   the number of pages in the object) operations when dealing with
>   multi-page chunks, just to figure out how large the chunk is!
> 
>   To solve this (but not add extra overhead), I suggest replacing
>   MALLOC_FIRST/MALLOC_FOLLOW with MALLOC_ONE/MALLOC_MANY:
> 
>   + If the chunk is one page, the table entry will contain MALLOC_ONE.
> 
>   + If the chunk is more than one page, the table entry for the first
>     page will contain MALLOC_MANY, the next entry will contain the
>     actual number of pages, and the remaining entries will be left as
>     MALLOC_FREE so that they do not need to be touched later.
> 
>   This will interfere slightly with some types of error detection, but
>   this seems like a small price to pay for better performance.

This change produces a small but measurable benefit after the malloced
size increases above one page on my test machine (a PII-400).  Here's
some timings for each iteration of a hard loop of "foo = malloc(size);
free(foo);":

	size (bytes)	 orig code	  new code
	        50	   4.695us	   4.893us
	       500	   4.935us	   5.163us
	     5,000	   3.136us	   3.208us
	    50,000	   4.241us	   4.306us
	   500,000	  33.007us	  31.417us
	 5,000,000	 162.392us	 148.005us
	50,000,000	1331.593us	1092.631us

Diffs for this are below.

> * free(3) does a linear search down the free chunk list to add the
>   page(s) being freed.
> 
>   This can be solved by making the free list be one of several types
>   of tree structure.

A simple test using a 307 element free list of a test program shows
that a simple tree based on tsearch takes approx 0.1us to find a random 
element, and a list takes between 0.04us and 1.8us.  There's obviously
some gains to be made.  (My tsearch based routines had a hard-coded
comparision instead of calling a function).

However, tsearch doesn't do any balancing and would probably degenerate
to a simple linked list easily and db(3) seems like _way_ too much
overhead.  Any pointers to a light-weight balanced btree implementation
that isn't GPL'd before I dig out "Algorithms+Data Structures=Programs"?
Skip lists might also apply here too...


Also, does anyone have a torture test for malloc(3)?

Simon.
--
Index: malloc.c
===================================================================
RCS file: /cvsroot/basesrc/lib/libc/stdlib/malloc.c,v
retrieving revision 1.26
diff -p -u -r1.26 malloc.c
--- malloc.c	1999/09/10 10:38:06	1.26
+++ malloc.c	1999/09/13 13:42:20
@@ -132,8 +132,8 @@ struct pgfree {
  */
 #define MALLOC_NOT_MINE	((struct pginfo*) 0)
 #define MALLOC_FREE 	((struct pginfo*) 1)
-#define MALLOC_FIRST	((struct pginfo*) 2)
-#define MALLOC_FOLLOW	((struct pginfo*) 3)
+#define MALLOC_ONE	((struct pginfo*) 2)
+#define MALLOC_MANY	((struct pginfo*) 3)
 #define MALLOC_MAGIC	((struct pginfo*) 4)
 
 /*
@@ -557,9 +557,14 @@ malloc_pages(size_t size)
     if (p) {
 
 	idx = ptr2idx(p);
-	page_dir[idx] = MALLOC_FIRST;
-	for (i=1;i<size;i++)
-	    page_dir[idx+i] = MALLOC_FOLLOW;
+	if (size == 1)
+	    page_dir[idx] = MALLOC_ONE;
+	else {
+	    page_dir[idx] = MALLOC_MANY;
+	    page_dir[idx+1] = (struct pginfo*)size;
+	    for (i=2;i<size;i++)
+		page_dir[idx+i] = MALLOC_FREE;
+	}
 
 	if (malloc_junk)
 	    memset(p, SOME_JUNK, size << malloc_pageshift);
@@ -757,7 +762,7 @@ irealloc(void *ptr, size_t size)
 
     mp = &page_dir[idx];
 
-    if (*mp == MALLOC_FIRST) {			/* Page allocation */
+    if (*mp == MALLOC_ONE || *mp == MALLOC_MANY) {	/* Page allocation */
 
 	/* Check the pointer */
 	if ((u_long)ptr & malloc_pagemask) {
@@ -766,8 +771,11 @@ irealloc(void *ptr, size_t size)
 	}
 
 	/* Find the size in bytes */
-	for (osize = malloc_pagesize; *++mp == MALLOC_FOLLOW;)
-	    osize += malloc_pagesize;
+	if (*mp == MALLOC_ONE)
+		osize = malloc_pagesize;
+	else {
+		osize = (int)page_dir[idx+1] * malloc_pagesize;
+	}
 
         if (!malloc_realloc && 			/* unless we have to, */
 	  size <= osize && 			/* .. or are too small, */
@@ -838,7 +846,7 @@ free_pages(void *ptr, int idx, struct pg
 	return;
     }
 
-    if (info != MALLOC_FIRST) {
+    if (info != MALLOC_ONE && info != MALLOC_MANY) {
 	wrtwarning("pointer to wrong page.\n");
 	return;
     }
@@ -848,12 +856,13 @@ free_pages(void *ptr, int idx, struct pg
 	return;
     }
 
-    /* Count how many pages and mark them free at the same time */
+    if (page_dir[idx] == MALLOC_ONE)
+	l = malloc_pagesize;
+    else {
+	l = (int)page_dir[idx+1] << malloc_pageshift;
+	page_dir[idx+1] = MALLOC_FREE;
+    }
     page_dir[idx] = MALLOC_FREE;
-    for (i = 1; page_dir[idx+i] == MALLOC_FOLLOW; i++)
-	page_dir[idx + i] = MALLOC_FREE;
-
-    l = i << malloc_pageshift;
 
     if (malloc_junk)
 	memset(ptr, SOME_JUNK, l);
@@ -1010,7 +1019,7 @@ free_bytes(void *ptr, int idx, struct pg
     *mp = info->next;
 
     /* Free the page & the info structure if need be */
-    page_dir[ptr2idx(info->page)] = MALLOC_FIRST;
+    page_dir[ptr2idx(info->page)] = MALLOC_ONE;
     vp = info->page;		/* Order is important ! */
     if(vp != (void*)info) 
 	ifree(info);