Subject: lib/31425: realloc(3) is inefficent
To: None <lib-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <njoly@pasteur.fr>
List: netbsd-bugs
Date: 09/30/2005 09:09:01
>Number:         31425
>Category:       lib
>Synopsis:       realloc(3) is inefficent
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Sep 30 09:09:00 +0000 2005
>Originator:     Nicolas Joly
>Release:        NetBSD 3.99.8
>Organization:
Nicolas Joly
Institut Pasteur, Paris.
>Environment:
System: NetBSD lanfeust.sis.pasteur.fr 3.99.8 NetBSD 3.99.8 (LANFEUST) #1: Fri Sep 16 10:24:54 CEST 2005 njoly@lanfeust.sis.pasteur.fr:/local/src/NetBSD/obj/amd64/sys/arch/amd64/compile/LANFEUST amd64
Architecture: x86_64
Machine: amd64
>Description:
During some benchmarking tests for our biological programs, we noticed that
some NCBI blast runs took many more time on NetBSD/amd64 than on the same
machine running Linux RHEL4.1 (2h10 vs. 20min). After tracking this one a
little, we noticed that realloc(3) function was one of the culprit ...

In most realloc cases, the function allocate a new buffer, copy the content
and then free the old one, which is very inefficent when there are many
calls (as in our test case).

>How-To-Repeat:
njoly [/tmp/realloc]> cat realloc.c 
#include <stdlib.h>
#include <string.h>

int main() {
  size_t len, inc;
  char *buf;
  int i;
  
  buf = NULL; len = 8*1024*1024;
  inc = len / 4; i = -1;

  while (len <= 128*1024*1024) {
     buf = realloc(buf,len);
     if (buf == NULL) { return 1; }
     memset(buf, 0, len);
     len = len + i * inc;
     if (i == -1) { i = 2; } else { i = -1; }
     }

  return 0; }
njoly [/tmp/realloc]> cc -o realloc realloc.c 
njoly [/tmp/realloc]> time ./realloc 
./realloc  9.28s user 5.53s system 99% cpu 14.817 total

Profiling this small program shows that most of the time is spent in
memcpy() function from inside realloc():

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 64.89      9.52     9.52       79   120.51   120.51  memcpy
 28.36     13.68     4.16       80    52.00    52.00  memset
  6.54     14.64     0.96       26    36.92    36.92  brk
  0.14     14.66     0.02       84     0.24     0.24  sbrk
  0.07     14.67     0.01       28     0.36     0.36  extend_pgdir
[...]

The very same code, run on RHEL4.1, is about 3x faster :

njoly [/tmp/realloc]> uname -a
Linux raclette.calcul.pasteur.fr 2.6.9-11.ELsmp #1 SMP Wed Jun 8 16:59:12 CDT 2005 x86_64 x86_64 x86_64 GNU/Linux
njoly [/tmp/realloc]> time ./realloc
./realloc  4.96s user 0.23s system 99% cpu 5.192 total

>Fix: