netbsd-bugs: kern/8380: Bug in `extent_free' causes panics after heavy load. Maybe it's PR 7481

Subject: kern/8380: Bug in `extent_free' causes panics after heavy load. Maybe it's PR 7481
To: None <gnats-bugs@gnats.netbsd.org>
From: None <Reinoud.Zandijk@ismaelda.netbsd.org>
List: netbsd-bugs
Date: 09/12/1999 04:20:56

>Number:         8380
>Category:       kern
>Synopsis:       Bug in `entent_free' causes panics after heavy load. Maybe related to PR 7481
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Sep 12 04:20:00 1999
>Last-Modified:
>Originator:     Reinoud Zandijk
>Organization:
	
>Release:        NetBSD-release 11 september 1999 <NetBSD-current source date>
>Environment:
Acorn RiscPC, ARM710, 16+8Mb DRAM, 2Mb VRAM, NetBSD 1.4.1 userland + kernel, Atomwide Etherlan 3.
	
System: NetBSD ismaelda 1.4.1 NetBSD 1.4.1 (config.ismaelda) #0: Sat Sep 11 22:24:29 CEST 1999 root@ismaelda:/usr/sources/release/src/sys/arch/arm32/compile/config.ismaelda arm32


>Description:

The problem shows itself as a kernel-panic after being subjected to heavy load. It occures very often and since I was upgrading my system to 1.4.1, I recompiled lots of stuff since no precompiled binarys were present.

Since I've compiled the inkernel debugger, the panic results in a debugger session, regretfully without a core dump. I managed to squeeze the following information out of it:


One crash :
"
extent 'swap 0x0000' (0x0 - 0xf617), flags = 0x0
     0x0 - 0x2b0
     0x0 - 0x0

extent_free : start 0x49b, end 0x49b
panic: extent_free : region not found
trace:
  _Debugger
  _panic
  _extent_free
  _uvm_swap_free
  _uvm_anon_dropswap
  _uvm_anfree
  _amap_wipeout
  _amap_unref
  _uvm_unmap_detach
  _uvm_unmap
  _uvm_deallocate
  _exit1
  _sys_exit
  _syscall
"

another crash :
"
[u]vmfault (0xf0116ed4,0,3,0) -> 1
unhandled trap (frame=0xf3478d4c
Data abort: 'Permission error (page)' status 00f
  address = 000072c
  pc = f0024c1c

stopped in pagedeamon at _extent_destroy + 0x12c: streq r3,[r14, #0x00c]
"

and in all others I get `just' the page fault somewhere in `extent_free'. I've tried to figure out the exact location and found the kernel crashed at line 964 : "LIST_INSERT_AFTER(rp, nrp, er_link)" in /usr/src/sys/kern/subr_extend.c.

My guess is that memory is getting more and more fragmented and as a result some table is running out of space or reaches a prrotected page and is thus running out of space??

Curious is that it may be related to screen handling since most of the errors occured when switching to another virtual terminal and typing in a command. When the command starts or ends executing, the system falls over.


	
>How-To-Repeat:

I managed, without intend.., to reproduce the error several times. Start with a new system, untar f.e. a huge tar file, say a kernel-source, delete an old kernelsource tree and compile the new one. Also a Sun-SLC was compiling it's new kernel on the RiscPC.

	
>Fix:
Silly: Don't stress the machine this hard....
Serious: I really dont know.... it maybe the combination between the port-code and the kernel code.....
	

>Audit-Trail:
>Unformatted: