Subject: kern/16067: [dM] shared libraries deadlock
To: None <gnats-bugs@gnats.netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: netbsd-bugs
Date: 03/26/2002 11:10:14
>Number:         16067
>Category:       kern
>Synopsis:       [dM] Deadlock when loading shared libraries
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 26 08:12:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     der Mouse
>Release:        Proprietary PPC port derived from 1.5W
>Organization:
	Dis-
>Environment:
	Proprietary PPC port derived from 1.5W
>Description:
	I've been asked to post this by a company that's been working
	with NetBSD.  I don't know much about the problem beyond what's
	here, as I did not see this myself, but I can get mail back to
	the actual originators.  (I realize this may seem to be a
	rather disorganized collection of information; it's what they
	sent me, and as I say, I didn't see it happen myself.  I'm not
	entirely sure why they asked me to post it instead of doing it
	themselves.  I'm trying to help them some in tracking it down
	myself; if anyone has any ideas, I/we would most appreciate
	hearing them.)

Symptom: deadlock situation when running script loading applications linked w/ 
shared libs

Proprietary NetBSD PPC port derived from the 1.5W

/*	$NetBSD: param.h,v 1.128 2001/06/03 02:48:45 thorpej Exp $	*/
    #define	__NetBSD_Version__	105230000	/* NetBSD 1.5W */
/*	$NetBSD: sys_process.c,v 1.67 2001/03/17 09:38:36 pooka Exp $	*/
/*	$NetBSD: kern_synch.c,v 1.104 2001/05/28 22:20:03 chs Exp $	*/
/*	$NetBSD: kern_lock.c,v 1.55 2001/06/05 04:38:09 thorpej Exp $	*/
/*	$NetBSD: uvm_fault_i.h,v 1.13 2001/06/02 18:09:26 chs Exp $	*/
/*	$NetBSD: uvm_map.h,v 1.28 2001/06/02 18:09:27 chs Exp $	*/
/*	$NetBSD: uvm_fault.c,v 1.64 2001/06/02 18:09:26 chs Exp $	*/
/*	$NetBSD: uvm_map.c,v 1.99 2001/06/02 18:09:26 chs Exp $	*/
/*	$NetBSD: uvm_io.c,v 1.15 2001/06/02 18:09:26 chs Exp $	*/
/*	$NetBSD: uvm_vnode.c,v 1.50 2001/05/26 21:27:21 chs Exp $	*/
/*	$NetBSD: procfs_mem.c,v 1.27 2000/11/24 18:58:37 chs Exp $	*/
/*	$NetBSD: layer_vnops.c,v 1.6 2001/06/07 13:32:47 wiz Exp $	*/

In platform-dependent part: arch/my_ppc
/*	$NetBSD: cpu.c,v 1.1 2000/02/29 15:21:46 nonaka Exp $	*/
/*	$NetBSD: locore.s,v 1.8 2000/11/16 05:38:33 thorpej Exp $	*/
/*	$NetBSD: machdep.c,v 1.11 2000/09/13 15:00:22 thorpej Exp $	*/
in arch/powerpc
/*	$NetBSD: Locore.c,v 1.4 2000/06/08 06:48:45 kleink Exp $	*/
/*	$NetBSD: locore_subr.S,v 1.2 2001/02/28 20:44:41 tsubai Exp $	*/
/*	$NetBSD: mem.c,v 1.9 2001/02/04 17:38:11 briggs Exp $ */
/*	$NetBSD: pmap.c,v 1.44 2001/06/10 11:01:27 tsubai Exp $	*/
/*	$NetBSD: powerpc_machdep.c,v 1.4 2001/04/05 09:58:05 tsubai Exp $	*/
/*	$NetBSD: process_machdep.c,v 1.5 2001/02/04 17:38:11 briggs Exp $	*/
/*	$NetBSD: sys_machdep.c,v 1.3 2000/06/09 14:08:45 kleink Exp $	*/
/*	$NetBSD: trap.c,v 1.46 2001/06/10 16:31:59 tsubai Exp $	*/
/*	$NetBSD: trap_subr.S,v 1.6 2001/06/08 00:16:25 matt Exp $	*/
/*	$NetBSD: trap_subr_mp.S,v 1.2 2001/06/10 11:09:28 tsubai Exp $	*/
/*	$NetBSD: vm_machdep.c,v 1.28 2001/06/10 11:01:28 tsubai Exp $	*/


db> ps /a
 PID          COMMAND      STRUCT PROC *     UAREA *           VMSPACE/VM_MAP
 87       my_asynctst      0x5c33570         0xefbd5000          0x324bd48
 85        my_snmpsub      0x46d1c88         0xefbc2000          0x324ba28
 83            crunch      0x5c331e0         0xefbcd000          0x324bbb8
 82            crunch      0x5c33018         0xefbc8000          0x324baf0
 80        my_snmptst      0x46d1730         0xefbb3000          0x324b7d0
 79           syslogd      0x46d1ac0         0xefbbe000          0x324b960
 77          my_xtest      0x46d18f8         0xefbb9000          0x324b898
 74             my_xp      0x46d1568         0xefbaf000          0x324b708
 72      my_event_tst      0x46d11d8         0xefba5000          0x324b3e8
 68             my_xp      0x46d1010         0xefba1000          0x324b640
 64            crunch      0x3277728         0xefb8f000          0x324b190
 56            crunch      0x3277c80         0xefb9d000          0x324b258
 49            crunch      0x32778f0         0xefb98000          0x324b578
 33            crunch      0x3277ab8         0xefb94000          0x324b320
 12            crunch      0x3277560         0xefb8b000          0x324b0c8
 11          aiodoned      0x3277398         0xefb86000           0x159680
 10           ioflush      0x32771d0         0xefb82000           0x159680
 9             reaper      0x3277008         0xefb7e000           0x159680
 8         pagedaemon      0x3246c78         0xefb7a000           0x159680
 7              nfsio      0x3246ab0         0xefb76000           0x159680
 6              nfsio      0x32468e8         0xefb72000           0x159680
 5              nfsio      0x3246720         0xefb6e000           0x159680
 4              nfsio      0x3246558         0xefb6a000           0x159680
 1             crunch      0x3246000         0xefb5e000          0x324b000
 0            swapper       0x159748           0x1bf000           0x159680
 88            crunch      0x5c33738         0xefbd9000          0x324be10

db>  ps /w
 PID          COMMAND     EMUL  PRI UTIME STIME WAIT-MSG    WAIT-CHANNEL
 87       my_asynctst   netbsd   24   0.1   0.2 select      selwait
 85        my_snmpsub   netbsd    4   2.0   2.1 vmmaplk     kernel_map_store+0x4
 83            crunch   netbsd    4   0.0   0.1 vmmaplk     kernel_map_store+0x4
 82            crunch   netbsd    4   0.1   0.0 vmmaplk     kernel_map_store+0x4
 80        my_snmptst   netbsd    4   0.5   0.4 vmmaplk     kernel_map_store+0x4
 79           syslogd   netbsd    4   0.0   0.1 vmmaplk     kernel_map_store+0x4
 77          my_xtest   netbsd   24   0.0   0.0 select      selwait
 74             my_xp   netbsd    4   0.1   0.3 vmmaplk     kernel_map_store+0x4
 72      my_event_tst   netbsd    4   0.0   0.1 vmmaplk     kernel_map_store+0x4
 68             my_xp   netbsd   40   0.3   0.6 lockf       0xe09f5140
 64            crunch   netbsd   20   0.0   0.2 vnlock       0x5f56c7c
 56            crunch   netbsd   24   0.0   0.0 select      selwait
 49            crunch   netbsd    4   0.7   0.0 vmmaplk     kernel_map_store+0x4
 33            crunch   netbsd   32   0.0   0.1 mfsidl       0x39fbe08
 12            crunch   netbsd    4   0.0   0.3 vmmaplk     kernel_map_store+0x4
 11          aiodoned   netbsd    4   0.0   0.0 aiodoned    uvm+0x34
 10           ioflush   netbsd    4   0.0   0.0 vmmaplk     kernel_map_store+0x4
 9             reaper   netbsd    4   0.0   0.6 vmmaplk     kernel_map_store+0x4
 8         pagedaemon   netbsd    4   0.0   0.0 pgdaemon    uvm+0x28
 7              nfsio   netbsd   32   0.0   0.0 nfsidl      nfs_iodwant+0xc
 6              nfsio   netbsd   32   0.0   0.0 nfsidl      nfs_iodwant+0x8
 5              nfsio   netbsd   32   0.0   0.0 nfsidl      nfs_iodwant+0x4
 4              nfsio   netbsd   32   0.0   0.1 nfsidl      nfs_iodwant
 1             crunch   netbsd   32   0.0   0.1 wait         0x3246000
 0            swapper   netbsd    4   0.0   0.0 scheduler   proc0
 88            crunch   netbsd   50   0.0   0.0
db>
From ps /a
 74             my_xp      0x46d1568         0xefbaf000          0x324b708

 my_xp proc = 0x46d1568 (PID=74)
 0x46d1568:     46d1ac0  #p_forw=syslogd
 				      0  #p_back         
 				46d11d8  #p_list=my_event_tst (first proc)   
 				46d1900     
 				32471c8  #p_cred   
 				3248480  #p_fd   
 				3249090  #p_cwdi   
 			   efbaf128  #p_stats  
 			     159928  #p_limit    
 				324b708  #p_vmspace    
 				43f2c88  #p_sigacts   
 				     14  #p_exitsig
    			   4006  #p_flag      
    			16b4a0   #p_cpu  (->struct schedstate_percpu ci_schedstate) 
    			3000000  #p_stat=3   
    			     4a  #PID (74)       
    			      0           
   			   e0892928    
   			    46d11d8     
   			    46d1778     
   			    3277728     
   			    46d11d8     
   			    46d1784     
   			          0
     				  0           
     				  0           
     				  0           
     				  0           
     				  0           
 			     140a8c      
   			          0           
   			   e8f75ea0    
   			          0           
   			       37d4        
   			    46d1568     
   			          0
                      0           
                      0           
                 1271bc      
                    2ed         
                    29a         
                      0

Find out PCB (ptr to pcb at proc+0x1b8):
db> examine /x 0x46d1568+0x1b8
 0x46d1720:     efbaf000

Display PCB contents:
db> examine /x efbaf000,10
0xefbaf000:     e09e2a80    534da80     efbb2980    a           efbb2ce8    1
0xefbaf018:     ffff        21a238ac    0           0           0           0
0xefbaf030:     0           0           0           
The SPL = 0xa and SP=efbb2980

in locore.s (cpu_switch):

	lwz	1,PCB_SP(4)		/* get new procs SP */

	ori	3,3,PSL_EE@l	/* interrupts are okay again */
	mtmsr	3

	lmw	10,8(1)			/* get other regs */ /*N=32-10=22
	lwz	1,0(1)			/* get saved SP */
	mr	2,12			/* get saved r2 */
	mtcr	11			/* get saved cr */
	isync
	mtsr	USER_SR,10		/* get saved USER_SR */
	isync

switch_return:
	mr	30,7			/* save proc pointer */
	lwz	3,PCB_SPL(4)
	bl	_C_LABEL(lcsplx)

	mr	3,30			/* get curproc for special fork
					   returns */

	lwz	31,12(1)
	lwz	30,8(1)
	addi	1,1,16
	lwz	0,4(1)
	mtlr	0
	blr

The map of the stack as pointed by PCB_SP(4):
+00  R01(sv) ->  +00 
                 +04  
+08  R10         +08  R30
+0C  R11         +0C  R31
 ......          +10  <-Process R1
+5C  R31         +14  <-Process LR

For pid 74 (my_xp)

0xefbb2980 :
+00 efbb29e0 -> 

0xefbb29e0:     
    efbb29f0    
        5164        
     46d1568     R30
           0     R31      
    efbb2a20     R01
       2c1d0     LR  
    3c764d9b    
       cf7ea       
    efbb2a20    
           0           
      140a8c      
        8000
           0           
Call stack my_xp:
                efbb2a20    2c1d0     mi_switch (2c020)
0xefbb2a20:     efbb2a50    2b910     bpendtsleep(2b910) kern_synch.c#474
0xefbb2a50:     efbb2a80    22ca0     lockmgr(229dc)	 kern_lock.c#543
0xefbb2a80:     efbb2bc0    ca440     uvm_fault(ca2d8)	 uvm_fault_i.h#179
0xefbb2bc0:     efbb2c30    e1c98     ->trap (e1b14)	 trap.c#129
0xefbb2c30:     efbb2ce0    578c      ->trapexit
0xefbb2ce0:     efbb2d60    e6e54     ->intrctlr_dispatch (e6de8)
0xefbb2d60:     efbb2d80    29cdc     ->uiomove (29bb0)
0xefbb2d80:     efbb2e10    bd884     ->ffs_write (bd5d8)
0xefbb2e10:     efbb2e60    60944     ->vn_write (607a0)
0xefbb2e60:     efbb2ec0    392c4     ->dofilewrite (391e4)
0xefbb2ec0:     efbb2ee0    391c8     ->sys_write (3912c)
0xefbb2ee0:     efbb2f50    e1f28     ->trap (e1b14)
0xefbb2f50:     ffffe8b0    578c      ->trapexit



db> show uvmexp
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  127784 VM pages: 6899 active, 0 inactive, 117 wired, 110861 free
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  pages  3705 anon, 2383 vnode, 802 vtext
  freemin=64, free-target=85, inactive-target=0, wired-max=42594
  faults=18277, traps=0, intrs=0, ctxswitch=24476
  softint=16374, syscalls=195770, swapins=0, swapouts=0
  fault counts:
    noram=0, noanon=0, pgwait=0, pgrele=0
    ok relocks(total)=762(762), anget(retrys)=2322(0), amapcopy=1961
    neighbor anon/obj pg=1434/23959, gets(lock/unlock)=11232/762
    cases: anon=1464, anoncow=858, obj=7258, prcopy=3974, przero=4324
  daemon and swap counts:
    woke=0, revs=0, scans=0, obscans=0, anscans=0
    busy=0, freed=0, reactivate=0, deactivate=0
    pageouts=0, pending=0, nswget=0
    nswapdev=0, nanon=119381, nanonneeded=119381 nfreeanon=115676
    swpages=0, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
    objs(kern/kmem/mb)=0x140a24/0x140b20/0x140b38
      0x32
db>
db> call uvmcnt_dump
# map lookup hint hits = 37834
# map lookup calls = 52478
# uvm_map() missed forward = 316
# uvm_map() back merges = 4950
# uvm_map() successful calls = 9201
  0x15bacc
db>

>How-To-Repeat:
	Unknown.
>Fix:
	Unknown.
>Release-Note:
>Audit-Trail:
>Unformatted: