Subject: Re: Possible swapping bug (design 'feature?')
To: None <netbsd-bugs@sun-lamp.cs.berkeley.edu>
From: Christoph Badura <bad@flatlin.ka.sub.org>
List: netbsd-bugs
Date: 07/09/1994 12:21:00
Mike Kenenberger writes:
>The problem is that while a "ps aux" shows that the child process has exited
>and isn't using any memory, "/usr/sbin/swapinfo -k" shows that the virtual
>memory in use was never freed.  Additional forks() will use additional virtual
>memory until the system finally freezes up when we run out of virtual memory.
>Terminating the original process (the parent of each child) finally releases
>the swap space allocated and the operating system returns to normal.

>Since this problem hasn't occurred under Mach, linux, or SunOS, I assume it's
>a problem with NetBSD.  If nothing else, it's a non-standard implementation
>design.  What I'd like is a fix or a workaround since restarting the parent
>process ever ten hours (by using 150 Megs swap space) isn't a desirable option.

The same problem was discussed on bsdi-users a while ago.  Here's the
final posting which includes comments from Mike Karels:

]From: maillist@candle.uucp (Bruce Momjian)
]Subject: Swap overallocation
]Date: 16 Dec 1993 07:50:11 +0100
]
]I am running BSD/386 from BSDI.  When running with 5MB of RAM, I found
]that the system locked up about once every week.  In researching the
]problem with Mike Karels of BSDI, I think we have found a bug that
]exists on BSD/386 and most free 386-based *BSD systems.  Here are the
]details.
]
]First, let me define copy-on-write(COW):  When a process forks, the OS
]maps the address space of both the parent and child to the same memory
]pages, and both process start running.  If either process makes changes
]to its shared memory pages, the OS makes a copy of the shared page.  One
]process gets the original, another gets the copy.
]
]Ok, here is the bug we have found:  If a process forks a child, and the
]parent writes to its memory pages (forcing a COW), and those pages are
]paged out to swap before the child exec's or exits, the parent's and
]child's<!> swap space is not released until the parent exits.
]
]The ramifications of this is that if you have a long-running process
]that forks a lot, like a shell, and your system does a lot of paging,
]those long-running process will allocated more and more swap until they
]exit.  It is particularly a problem with non-csh shells (csh, uses
]vfork and exec), because they often run scripts by forking themselves,
]and the child running the script may exist for quite some time without
]exec'ing or exit'ing.
]
]Here are Mike Karels more detailed words on the subject:
]
]---------------------------------------------------------------------------
]
]... The problem here is that if the process forks, and the parent modifies
]data pages while the child exists, it must make copies of those pages
](copy-on-write after fork).  If those copies are paged out, then both
]the copies and the originals will occupy space until the parent exits,
]even if the child exits.
]
]I think I described the chains of shadow objects that were accumulating,
]and the fact that those are supposed to get coalesced.  It turns out
]that the code to coalesce does not work if an object has been paged out.
]This is the scenario that causes problems:
]
]	- a long-lived program forks repeatedly,
]	- the parent modifies data space before the child does exec or exit,
]	  and
]	- the parent's modified pages get paged out before child does exec
]	  or exit.
]
]The only situation in which this seems to be a problem is if a login
]shell (or any long-running interactive shell) runs scripts by forking
]and running them directly.  This will not happen with csh; I don't
]know about ksh or bash.  (It does not happen with csh because it uses
]vfork, and re-exec's itself if running a csh script).  It also does
]not happen if the scripts are "executable" scripts, i.e. those that start
]with #!/bin/sh.  It is also a problem only if the script or other system
]activity uses enough memory for the shell to be paged out while the
]script is running.
]
]The bad news is that this problem is not easy to solve... However, I
]think there are some workarounds that can be used for the moment.
]
]---------------------------------------------------------------------------
]
]My experience with 5MB of RAM and 20MB of swap running several screens
](no X, no networking) was that because I never logged out, my shell
]accumulated swap space until it ran out.  About every 7 days the system
]had to be rebooted (everything had stopped running).
]
]I hope this helps explain some lockup problems some people may be
]having.  Has anyone solved this problem?  I don't know the specifics of
]why it is occurring, or why it is hard to solve, but if someone has
]already solved it, I would love to hear about it.
]
]Attached is a program that illustrates the problem.  With MAKE_CHILD
]undefined, swap space is allocated the first time through the loop, and
]stays pretty constant.  With MAKE_CHILD defined, swap decreases rapidly
]each time through the loop until the system runs out of swap space and
]locks up.  Note that each child is killed before the loop is restarted,
]yet the swap space continues to decline rapidly.  You will need to
]define some things at the top before you compile, including your systems
]program for monitoring swap space.
]
]---------------------------------------------------------------------------
]
]
]/* show swap overallocation bug in child processes */
]/* Bruce Momjian, root@candle.uucp */
]
]/* tabs = 4 */
]
]#include <stdio.h>
]#include <unistd.h>
]#include <stdlib.h>
]#include <signal.h>
]
]#define MAKE_CHILD
]
]/* make this higher if you have more than 8 MB of RAM */
]#define SYSTEM_RAM	4
]
]/* program to show remaining swap space, vmstat? */
]#define SHOWSWAP	"swaptotal"		
]
]int k = 1024;
]
]void main()
]{
]	char *y;
]	int c_pid;
]	int j;
]	char *t;
]
]	/* make my address space big */
]	if ( (y=malloc(SYSTEM_RAM*k*k)) == NULL)
]	{
]		perror("Malloc");
]		exit(1);
]	}
]
]	while (1)
]	{
]#ifdef MAKE_CHILD
]		if ((c_pid = fork()) == 0)
]			sleep(1000);
]#endif
]
]		/* parent touches memory to force COW copy */
]		for (j=0,t=y; j < SYSTEM_RAM*k*k; j+=k)
]		{
]			*t = 'x';
]			t += k;
]		}
]
]#ifdef MAKE_CHILD
]		kill(c_pid,SIGHUP);
]#endif
]
]		puts("done ");
]		system(SHOWSWAP);
]	}
]	/* NOT REACHED */
]}
]



-- 
Christoph Badura	bad@flatlin.ka.sub.org          Home +49 721 606137
						        Work +49 228 445175
Es genuegt nicht, keine Gedanken zu haben;
man muss auch unfaehig sein, sie auszudruecken.  - Karl Kraus

------------------------------------------------------------------------------