Subject: kern/22786: panic in uao_find_swslot
To: None <gnats-bugs@gnats.netbsd.org>
From: None <chris@pin.lu>
List: netbsd-bugs
Date: 09/14/2003 16:50:20
>Number:         22786
>Category:       kern
>Synopsis:       panic in uao_find_swslot under high page fault load
>Confidential:   no
>Severity:       critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Sep 14 14:51:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Christian Limpach <chris@pin.lu>
>Release:        NetBSD 1.6Z (current as of 2003-09-13)
>Organization:
	
>Environment:
	
	
System: NetBSD 1.6Z (OITO.MP) #183: Sat Sep 13 20:25:13 CEST 2003
Architecture: i386
Machine: i386
>Description:
The following panic occured while I was runnning some tests to exercise the
SA/pagefault code.  I run the tests on a SMP machine.

uvm_fault(0xc0414000, 0xc0d47000, 0, 1) -> 0xe
kernel: page fault trap, code=0
Stopped in pid 2.1 (pagedaemon) at      netbsd:uao_find_swslot+0x3b:    movl    0
(%eax,%ebx,4),%eax
db{0}> bt
uao_find_swslot(cb8ac000,deadb,1eb,c02e5cd3,1ed) at netbsd:uao_find_swslot+0x3b
uvmpd_scan_inactive(c0442a74,c03b6f94,2f9,0,7956) at netbsd:uvmpd_scan_inactive+0x42f
uvmpd_scan(cb3f5084,c02e54a0,0,c0100a1d,0) at netbsd:uvmpd_scan+0x82
uvm_pageout(cb3f5084,0,0,0,c010030c) at netbsd:uvm_pageout+0x12e
db{0}> mach cpu 1
using cpu 1
db{0}> bt
_lockmgr(c043eb60,400002,0,c03a552c,52f) at netbsd:_lockmgr+0xae2
_kernel_proc_lock(cb8c431c,cb8fcf7c,4,c08e1800,0) at netbsd:_kernel_proc_lock+0x56
syscall_plain(cb8fcfa8,bfbf001f,1f,1f,bfbf001f) at netbsd:syscall_plain+0xb3

(gdb) bt           
#0  0xc02d3aff in uao_find_swslot (uobj=0xcb8ac000, pageidx=912091)
    at ../../../../uvm/uvm_aobj.c:303
#1  0xc02e5bd3 in uvmpd_scan_inactive (pglst=0xc0442a74)
    at ../../../../uvm/uvm_pdaemon.c:572
#2  0xc02e5f8a in uvmpd_scan () at ../../../../uvm/uvm_pdaemon.c:778
#3  0xc02e55ce in uvm_pageout (arg=0xcb3f5084) at ../../../../uvm/uvm_pdaemon.c:253

(gdb) f 0
#0  0xc02d3aff in uao_find_swslot (uobj=0xcb8ac000, pageidx=912091)
    at ../../../../uvm/uvm_aobj.c:303
303             return(aobj->u_swslots[pageidx]);
(gdb) p/x pageidx
$13 = 0xdeadb

(gdb) up
#1  0xc02e5bd3 in uvmpd_scan_inactive (pglst=0xc0442a74)
    at ../../../../uvm/uvm_pdaemon.c:572
572                                             slot = uao_find_swslot(uobj,
(gdb) p anon
$14 = (struct vm_anon *) 0x0
(gdb) p uobj
$15 = (struct uvm_object *) 0xcb8ac000
(gdb) p p
$34 = (struct vm_page *) 0xc07b7200
(gdb) p/x *p
$16 = {pageq = {tqe_next = 0x0, tqe_prev = 0xc0861f80}, hashq = {tqe_next = 0x0, 
    tqe_prev = 0xc39e6178}, listq = {tqe_next = 0xc0724738, tqe_prev = 0xc075f268}, 
  uanon = 0xdeadbeef, uobject = 0xdeadbeef, offset = 0xdeadbeef, flags = 0x8, 
  loan_count = 0x0, wire_count = 0x0, pqflags = 0x1, phys_addr = 0x6044000, mdpage = {
    mp_pvhead = {pvh_lock = {lock_data = 0x0, lock_file = 0xc03bf7e0, 
        unlock_file = 0xc03bf7e0, lock_line = 0xaa5, unlock_line = 0xabe, list = {
          tqe_next = 0x0, tqe_prev = 0x0}, lock_holder = 0xffffffff}, pvh_list = 0x0}, 
    mp_attrs = 0x6044407}}
[(un)lock_file in the above is i386/pmap.c]
(gdb) p nextpg
$35 = (struct vm_page *) 0xc085db70
(gdb) p *pglst
$37 = {tqh_first = 0xc085db70, tqh_last = 0xc08979d8}
(gdb) p/x *nextpg
$33 = {pageq = {tqe_next = 0xc06445e8, tqe_prev = 0xc0442a74}, hashq = {tqe_next = 0x0, 
    tqe_prev = 0xc39cdc58}, listq = {tqe_next = 0xc0851e08, tqe_prev = 0xcb8a59b4}, 
  uanon = 0xc39c4278, uobject = 0x0, offset = 0x0, flags = 0x8, loan_count = 0x0, 
  wire_count = 0x0, pqflags = 0x12, phys_addr = 0x7e8e000, mdpage = {mp_pvhead = {
      pvh_lock = {lock_data = 0x0, lock_file = 0xc03bf7e0, unlock_file = 0xc03bf7e0, 
        lock_line = 0xaa5, unlock_line = 0xabe, list = {tqe_next = 0x0, tqe_prev = 0x0}, 
        lock_holder = 0xffffffff}, pvh_list = 0xc09f68b0}, mp_attrs = 0x7e8ec47}}
(gdb) p *uobj
$26 = {vmobjlock = {lock_data = 1, 
    lock_file = 0xc03b6f94 "../../../../uvm/uvm_pdaemon.c", 
    unlock_file = 0xc03b6f94 "../../../../uvm/uvm_pdaemon.c", lock_line = 491, 
    unlock_line = 883, list = {tqe_next = 0x0, tqe_prev = 0xc0442a8c}, lock_holder = 0}, 
  pgops = 0xc03feb2c, memq = {tqh_first = 0xc070aed8, tqh_last = 0xc0825aa8}, 
  uo_npages = 7, uo_refs = 3}

[looking through uobj memq] p is not on uobj's memq

AFAICT uvmpd_scan_inactive does not handle the condition where PG_CLEAN is
set, anon == NULL and p->loan_count == 0.  I think that uvm_fault can
create this condition but I don't know if it shouldn't create it or if
uvmpd_scan_inactive needs to handle it.  Then again, it might be something
completely different.

>How-To-Repeat:
Unknown.  Cause a lot of page faults?  I've run tests like these for
 >20 hours without seeing this failure :-(

I use the following program to cause page faults on a 128MB machine:

#include <unistd.h>
#include <malloc.h>

#define MB 116
#define PS (4*1024)

int
main (int argc, char **argv, char **envp)
{
  int i;
  char *x;
  for (i = 0; i < MB*1024*1024/PS; i++) {
    x = (char *)malloc (PS);
    x[0] = 0;
  }
  printf ("done\n");
  sleep(10);
  execve (argv[0], argv, envp);
  return 0;
}

My kernel is patched with the following patch:
http://lola.pin.lu/netbsd/kernel-patches/sa-pagefault-030913.patch

>Fix:
	
>Release-Note:
>Audit-Trail:
>Unformatted: