Subject: RE: Performance Problem: malloc() is calling madvise()
To: Bill Dorsey <dorsey@lila.com>
From: Todd Vierling <tv@pobox.com>
List: current-users
Date: 05/21/2000 02:37:59
On Sat, 20 May 2000, Bill Dorsey wrote:

: Looks like gnumalloc is comparable in performance to the old malloc
: and noticeably better than the new one (at least on my 1.4Y
: Alpha system).

Given that this is a C++ program, my conclusion is that "::operator new" is
called _a_lot_ in the program.  With that assumption, these results aren't
unexpected; gnumalloc is good with smaller sized allocations.

There are some options that will tweak the new malloc a bit more; in
particular, see '<' and '>'.

: > significantly worse with the current malloc are in many cases 
: > poorly designed;
: > they should manage their memory allocation themselves to a much greater
: > extent rather than calling malloc for 3 bytes all the time.  If 
: > you take this
: > as a criticism of C++ itself, you'd be pretty much on-target.

: Interesting.  I always thought operating systems were designed for
: the express purpose of servicing applications programs, regardless
: of what language they are written in.

Part of that burden lay on the language compiler (in this case,
egcs/gcc).  C++ uses an allocated-object model for much of its work,
particularly in applications with polymorphism; this implies a lot of calls
to the allocator.

In the context of gcc/egcs, "the allocator" here is nothing more than a
wrapper around malloc() and free(), which is IMNSHO suboptimal.  However,
that strategy is considered a default fallback, and a programmer is
encouraged to introduce replacements for "class c::operator new" and "class
c::operator delete" at any time--or even for "::operator new" and
"::operator delete", the global entry points.

In my own libraries, I've often substituted "operator new" and "operator
delete" to use more efficient strategies for that type of object and its
subclasses.  Some implementations have used fixed-size chunk allocations in
pools, under the assumption that the programmer may not subclass beyond the
existing implementation.  This kind of situation-based strategy results in
_much_ faster code than any generic allocator could provide.

: It is my contention that in general, the operating system should
: trade memory for speed within some reasonable bounds.  Typical
: systems today have 64 megabytes or more so there is no reason to
: try to squeeze every last byte out of a memory allocator if it
: has a significant speed penalty associated with it.

The penalty for allocations depends on the allocation strategy.  In C, much
allocation [of temporaries] often happens on the stack, which is orders of
magnitude faster than malloc().  In C++, though not particularly necessary,
programmers tend to explicitly allocate temporaries, resulting in more calls
to the allocator for sizes of a small average.

(As a somewhat unrelated side point, the "typical" NetBSD system is probably
somewhere closer to 32MB, and some run as low as 4MB.  <<pointing to
386SX-25 floppy-only NAT box in the corner>>)

:  Obviously, there are exceptions to this rule if you are running on a
: palmtop computer or other similarly handicapped hardware.  I don't
: believe it is reasonable to sacrifice performance across the board for
: the sake of a handful of machines with limited memory resources,
: however.

Feel free to compile/link an application with dmalloc, or gnumalloc, or your
choice of allocator.  My own performance tests on non-object-intensive
programs suggest that my server has had performance boosts by the new
malloc.

With all the above said, I agree that the new malloc could be improved, but
the blame for bad C++ performance lies on more than just malloc.  8^)

-- 
-- Todd Vierling (tv@pobox.com)