Subject: kern/16364: [dM] Running out of RAM loses badly
To: None <gnats-bugs@gnats.netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: netbsd-bugs
Date: 04/15/2002 14:01:56
>Number:         16364
>Category:       kern
>Synopsis:       [dM] Running out of RAM loses badly
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 15 11:03:00 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     der Mouse
>Release:        i386 snapshot of 2002-04-10
>Organization:
	Dis-
>Environment:
	1.5ZC on P-III/800 (see below for dmesg)
>Description:
	-current behaves very badly upon running out of memory.  I have
	seen it hang and I have seen it do "kernel: page fault trap"
	and drop into ddb.  Unfortunately it seems extremely sensitive
	to details (see below), but I have never seen it behave well
	(not even as well as possible given the circumstances).
>How-To-Repeat:
	Boot single-user.  Don't swapconfig any swap space.  Grab most
	of the machine's RAM and hold onto it.  Mount an mfs large
	enough to overrun the remaining RAM.  Copy stuff into it and
	watch it fail.

	Specifically, here's what I did, starting at the first
	single-user prompt (this machine has /usr in /, and /mnt/mouse
	is where all my stuff lives - this machine was swiped from
	other uses and I am trying to disturb it minimally):

	# set -o emacs
	# cd /mnt/mouse
	# ls
	...
	# ./mmap -unit 5m 150m -w &
	# env TERM=dumb top | egrep Memory
	Memory: 152M Act, 100K Wired, 74M Free
	# mount_mfs -s 200000 swap /tmp
	# cp -r /usr/src /tmp &
	# jobs
	...
	# 

	At this point I wait until it fails.  With exactly this
	sequence, it crashes with this (ten-finger copy, whitespace not
	exact)

	uvm_fault(0xc064c000, 0x0, 0, 1) -> e
	kernel: page fault trap, code=0
	Stopped in pid 76 (ioflush) at genfs_putpages:0x268: movl 0x24(%esi),%eax)
	db> 

	However, it's extremely sensitive to details; on a run when I
	misspelled mount_mfs and got an error (and then got it right on
	the second try), it hung instead.  Breaking into ddb revealed
	that about six processes were runnable, one of them being the
	swapper, which (because its priority was 4, much lower than
	anything else runnable) was locking all the others out.  On
	another run, I skipped the ls and the top, and copied
	/usr/src/sys instead; this time, it hung with the cp and the
	pagedaemon livelocking with one another (at priority 4 and
	therefore starving out everything else).

	I feel confident that while the details of the failure will
	vary, some failure will occur; I've never had it "work" even to
	the extent of "killed: out of swap" with the rest of the system
	coming back.

	The kernel came from the 2002-04-10 i386 snapshot; its MD5 is 
	9aeca60034306d76d098ed7ba597aab9.  Full dmesg for the machine
	and kernel in question is below.  The mmap tester program
	("mmap" in the transcript above) is a little large to include
	here, even after bzip -9 | btoa; I've put a copy at
	ftp.netbsd.org:/pub/NetBSD/misc/mouse/mmap-stress.c
	and I'll be happy to mail copies as well.

	NetBSD 1.5ZC (GENERIC) #0: Wed Apr 10 13:45:59 CEST 2002
	    tron@lyssa.zhadum.de:/src/NetBSD-current/src/sys/arch/i386/compile/GENERIC
	cpu0: Intel Pentium III (Coppermine) (686-class), 797.45 MHz
	cpu0: I-cache 16 KB 32b/line 4-way, D-cache 16 KB 32b/line 2-way
	cpu0: L2 cache 256 KB 32b/line 8-way
	cpu0: features 383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR>
	cpu0: features 383f9ff<PGE,MCA,CMOV,FGPAT,PSE36,MMX>
	cpu0: features 383f9ff<FXSR,SSE>
	total memory = 255 MB
	avail memory = 230 MB
	using 3294 buffers containing 13176 KB of memory
	BIOS32 rev. 0 found at 0xfda74
	mainbus0 (root)
	pci0 at mainbus0 bus 0: configuration mode 1
	pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
	pchb0 at pci0 dev 0 function 0
	pchb0: Intel 82815 Hub (rev. 0x02)
	pchb0: random number generator enabled
	agp at pchb0 not configured
	ppb0 at pci0 dev 1 function 0: Intel 82815 AGP (rev. 0x02)
	pci1 at ppb0 bus 1
	pci1: i/o space, memory space enabled
	vga1 at pci1 dev 0 function 0: Nvidia Corporation RIVA TNT2 Model 64 (rev. 0x15)
	wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
	wsmux1: connecting to wsdisplay0
	ppb1 at pci0 dev 30 function 0: Intel 82801BA Hub-to-PCI Bridge (rev. 0x02)
	pci2 at ppb1 bus 2
	pci2: i/o space, memory space enabled
	ex0 at pci2 dev 9 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x74)
	ex0: interrupting at irq 3
	ex0: MAC address 00:01:02:39:93:fa
	bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
	bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
	ti0 at pci2 dev 10 function 0: 3Com 3c985-SX Gigabit Ethernet (rev. 0x01)
	ti0: interrupting at irq 11
	ti0: Ethernet address: 00:60:08:f6:1e:81
	ti1 at pci2 dev 12 function 0: 3Com 3c985-SX Gigabit Ethernet (rev. 0x01)
	ti1: interrupting at irq 9
	ti1: Ethernet address: 00:60:08:f5:e3:96
	pcib0 at pci0 dev 31 function 0
	pcib0: Intel 82801BA LPC Interface Bridge (rev. 0x02)
	pciide0 at pci0 dev 31 function 1: Intel 82801BA IDE Controller (ICH2) (rev. 0x02)
	pciide0: bus-master DMA support present
	pciide0: primary channel wired to compatibility mode
	wd0 at pciide0 channel 0 drive 0: <QUANTUM FIREBALLP LM10.2>
	wd0: drive supports 16-sector PIO transfers, LBA addressing
	wd0: 9729 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 19925880 sectors
	wd0: 32-bit data port
	wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
	pciide0: primary channel interrupting at irq 14
	wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)
	pciide0: secondary channel wired to compatibility mode
	atapibus0 at pciide0 channel 1: 2 targets
	cd0 at atapibus0 drive 0: <Lite-On LTN483S 48x Max, , PD02> type 5 cdrom removable
	cd0: 32-bit data port
	cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
	pciide0: secondary channel interrupting at irq 15
	cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA data transfers)
	uhci0 at pci0 dev 31 function 2: Intel 82801BA USB Controller (rev. 0x02)
	uhci0: interrupting at irq 10
	usb0 at uhci0: USB revision 1.0
	uhub0 at usb0
	uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
	uhub0: 2 ports with 2 removable, self powered
	Intel 82801BA SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 31 function 3 not configured
	isa0 at pcib0
	com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
	pckbc0 at isa0 port 0x60-0x64
	pckbd0 at pckbc0 (kbd slot)
	pckbc0: using irq 1 for kbd slot
	wskbd0 at pckbd0: console keyboard, using wsdisplay0
	lpt0 at isa0 port 0x378-0x37b irq 7
	pcppi0 at isa0 port 0x61
	midi0 at pcppi0: PC speaker
	sysbeep0 at pcppi0
	isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
	npx0 at isa0 port 0xf0-0xff: using exception 16
	fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
	fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
	isapnp0: no ISA Plug 'n Play devices found
	biomask f565 netmask ff6d ttymask ffef
	Kernelized RAIDframe activated
	boot device: wd0
	root on wd0a dumps on wd0b
	root file system type: ffs
>Fix:
	Unknown.
>Release-Note:
>Audit-Trail:
>Unformatted: