Subject: lib/31112: malloc.conf and system programs
To: None <lib-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Ken Raeburn <raeburn@MIT.EDU>
List: netbsd-bugs
Date: 09/01/2005 00:01:00
>Number:         31112
>Category:       lib
>Synopsis:       malloc.conf and system programs
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    lib-bug-people
>State:          open
>Class:          doc-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 01 00:01:00 +0000 2005
>Originator:     Ken Raeburn
>Release:        NetBSD 2.0
>Organization:
	MIT
>Environment:
System: NetBSD venix.mit.edu 2.0 NetBSD 2.0 (GENERIC) #0: Tue Nov 30 21:04:03 UTC 2004 builds@build:/big/builds/ab/netbsd-2-0-RELEASE/alpha/200411300000Z-obj/big/builds/ab/netbsd-2-0-RELEASE/src/sys/arch/alpha/compile/GENERIC alpha
Architecture: alpha
Machine: alpha
>Description:

The malloc man page says:

 EXAMPLES
      To set a systemwide reduction of cache size, and to dump core whenever a
      problem occurs:

            ln -s 'A<' /etc/malloc.conf

I had set malloc.conf->AJ, and many months later, rebooted.  On
startup, fsck_ffs died on an unclean file system with an "allocation
failed" error.  The file system is about 4G, and the machine has >1G
RAM, and unlimiting the process memory size didn't help.  Removing
malloc.conf made it work just fine: It cleaned up some unreferenced
files and stuff, and my system came up.

While the man page describes how to set the 'A' option system-wide,
persistent across reboots, this may be a poor idea.  One thing the man
page does not mention is that the 'A' option is very bad for a program
which tries to make allocation requests but can handle failures
cleanly and continue.  For example, a program that runs a garbage
collector on malloc failure, or a program that caches information it's
read if it can get the extra memory to do so -- like fsck_ffs appears
to do at first glance with inode data.

I have no reason to believe that the 'J' option was causing any
problems, or that fsck_ffs has a bug relating to this situation.  (It
is possible that there is a bug, and that the failed allocation was
for some absurdly large size that was read out of memory scribbled on
by enabling the 'J' option.  But it appears that there are parts of
the fsck_ffs code that cope well with running out of storage.)

>How-To-Repeat:

The fsck_ffs failure only happened when I had an unclean file system
with unreferenced files.  I don't know what other cases will trigger
it; a forced check of another, clean file system did not.

>Fix:

Don't recommend 'A' in system-wide options, only in MALLOC_OPTIONS and
malloc_options.

Describe the sorts of cases where 'A' may be inappropriate, and how
one should be extra careful with options set via malloc.conf for
system-wide effect.  (Is 'A' the only one likely to be a problem?)