netbsd-bugs: kern/10765: long "freeze" when killing processes that cause heavy paging

Subject: kern/10765: long "freeze" when killing processes that cause heavy paging
To: None <gnats-bugs@gnats.netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: netbsd-bugs
Date: 08/06/2000 00:51:18
>Number:         10765
>Category:       kern
>Synopsis:       long system freeze when killing processes that cause heavy paging
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Aug 06 00:52:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Simon Burge
>Release:        NetBSD-current 20000806 sources
>Organization:
Wasabi Systems
>Environment:
	System: NetBSD wincen 1.5D NetBSD 1.5D (WINCEN) #328: Sun Aug 6 13:51:23 EST 2000
	simonb@wincen:/usr/obj/sys/arch/i386/compile/WINCEN i386

>Description:
	The system appears to freeze for a while (90 seconds in my case)
	when killing processes that cause heaving paging.  When the
	system becomes responsive again, other similar processes don't
	continue.

>How-To-Repeat:
	On an i386 with 128MB of RAM (116MB available), run three copies
	of the following program with 24576 as an argument:

		#include <err.h>
		#include <stdio.h>
		#include <stdlib.h>

		main(int argc, char **argv)
		{
			char **foo;
			int i, j, size;

			srand(getpid() ^ time());

			if (argc > 1)
				size = atoi(argv[1]);
			else
				size = 0;

			if (size > 0) {
				foo = (char **)malloc(size * sizeof(char *));
				if (foo == NULL)
					errx(1, "no memory\n");

				for (i = 0; i < size; i++) {
					foo[i] = malloc(4096);
					foo[i][0] = 0;
				}
				while (1) {
					i = rand() % size;
					foo[i][0] = 0;
				}
			}
		}

	When one of these is killed, the system appears to wedge.  From
	DDB, the trace of the killed process is:

		db> t/t0t288
		cpu_Debugger(c0570c60,c0776540,c0570540,c01233ed,0) at cpu_Debugger+0x4
		comintr(c0580a00) at comintr+0xcd
		Xintr4() at Xintr4+0x70
		--- interrupt ---
		extent_free(c0570540,176ee,1,10) at extent_free+0x16e
		uvm_swap_free(176ef,1,c03c5498,c9c0de74,c01e4898) at uvm_swap_free+0x5b
		uvm_anon_dropswap(c9abcd00,463,c9bf44dc,c9c0de88,c01e3f23) at uvm_anon_dropswap+
		0x16
		uvm_anfree(c9abcd00) at uvm_anfree+0x78
		amap_wipeout(c9bf44dc,c9bf3ccc,c9bf3ccc,0,c9c0debc) at amap_wipeout+0x3b
		amap_unref(c9bf3ccc,0) at amap_unref+0x1a
		uvm_unmap_detach(c9bf3f0c,0,c995ec38,f,c9bf3f0c) at uvm_unmap_detach+0x31
		uvm_unmap(c995ec38,0,bfbfe000,c995ec38,c9c0df1c) at uvm_unmap+0xb3
		uvm_deallocate(c995ec38,0,bfbfe000) at uvm_deallocate+0x38
		exit1(c9be1968,f,f,c9be1968,c9be8354) at exit1+0x13f
		sigexit(c9be1968,f,c9be1968,8048a0c,106) at sigexit+0x9e
		postsig(f) at postsig+0xab
		trap() at trap+0x526
		--- trap (number 6) ---
		 0x8048a0c:

	After approx 90 seconds the system comes back to life, but the
	two remaining memory hog processes don't continue.  A trace of
	one on these is:

		db> t/t 0t284
		trace: pid 284 at 0xc9c03d88
		bpendtsleep(c05e3558,11,c02419f5,0,0) at bpendtsleep
		biowait(c05e3558,2,13d5b,c9c03f40,9ead8) at biowait+0x31
		uvm_swap_io(c9c03e20,13d5b,1,100000,c0410208) at uvm_swap_io+0x226
		uvm_swap_get(c0410208,13d5b,2,c9c03ee0,c9c03f0c) at uvm_swap_get+0x51
		uvmfault_anonget(c9c03f40,c9bf439c,c9ad4db0,d98f000,0) at uvmfault_anonget+0x188

		uvm_fault(c995ea10,d98f000,0,3,0) at uvm_fault+0x989
		trap() at trap+0x409
		--- trap (number 6) ---
		 0x8048a0c:

	Killing the remaining memory hogs each results in a similar
	"freeze".

>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted: