Subject: port-powerpc/16069: [dM] locking lossage
To: None <gnats-bugs@gnats.netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: netbsd-bugs
Date: 03/26/2002 11:16:26
>Number:         16069
>Category:       port-powerpc
>Synopsis:       [dM] locking lossage
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-powerpc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 26 08:18:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     der Mouse
>Release:        Proprietary PPC port derived from 1.5W
>Organization:
	Dis-
>Environment:
	Proprietary PPC port derived from 1.5W
>Description:
	I've been asked to post this by a company that's been working
	with NetBSD.  I don't know much about the problem beyond what's
	here, as I did not see this myself, but I can get mail back to
	the actual originators.  (I realize this may seem to be a
	rather disorganized collection of information; it's what they
	sent me, and as I say, I didn't see it happen myself.  I'm not
	entirely sure why they asked me to post it instead of doing it
	themselves.  I'm trying to help them some in tracking it down
	myself; if anyone has any ideas, I/we would most appreciate
	hearing them.)

Symptoms:
1) kernel panic as a result of DSI trap
2) gdb locked on a "vmmaplock" channel, kernel deadlock, lost console , can only go to db
   kannot kill deadlocked processes from within db
3) gdb locked on a "uvn_fp2" channel, kernel deadlock, lost console , can only go to db
   kannot kill deadlocked processes from within db

Proprietary NetBSD PPC port derived from the 1.5W

LOCKDEBUG not defined, not a SMP config.
/*	$NetBSD: param.h,v 1.128 2001/06/03 02:48:45 thorpej Exp $	*/
    #define	__NetBSD_Version__	105230000	/* NetBSD 1.5W */
/*	$NetBSD: sys_process.c,v 1.67 2001/03/17 09:38:36 pooka Exp $	*/
/*	$NetBSD: kern_synch.c,v 1.104 2001/05/28 22:20:03 chs Exp $	*/
/*	$NetBSD: kern_lock.c,v 1.55 2001/06/05 04:38:09 thorpej Exp $	*/
/*	$NetBSD: uvm_fault_i.h,v 1.13 2001/06/02 18:09:26 chs Exp $	*/
/*	$NetBSD: uvm_map.h,v 1.28 2001/06/02 18:09:27 chs Exp $	*/
/*	$NetBSD: uvm_fault.c,v 1.64 2001/06/02 18:09:26 chs Exp $	*/
/*	$NetBSD: uvm_map.c,v 1.99 2001/06/02 18:09:26 chs Exp $	*/
/*	$NetBSD: uvm_io.c,v 1.15 2001/06/02 18:09:26 chs Exp $	*/
/*	$NetBSD: uvm_vnode.c,v 1.50 2001/05/26 21:27:21 chs Exp $	*/
/*	$NetBSD: procfs_mem.c,v 1.27 2000/11/24 18:58:37 chs Exp $	*/
/*	$NetBSD: layer_vnops.c,v 1.6 2001/06/07 13:32:47 wiz Exp $	*/

In platform-dependent part: arch/my_ppc
/*	$NetBSD: cpu.c,v 1.1 2000/02/29 15:21:46 nonaka Exp $	*/
/*	$NetBSD: locore.s,v 1.8 2000/11/16 05:38:33 thorpej Exp $	*/
/*	$NetBSD: machdep.c,v 1.11 2000/09/13 15:00:22 thorpej Exp $	*/
in arch/powerpc
/*	$NetBSD: Locore.c,v 1.4 2000/06/08 06:48:45 kleink Exp $	*/
/*	$NetBSD: locore_subr.S,v 1.2 2001/02/28 20:44:41 tsubai Exp $	*/
/*	$NetBSD: mem.c,v 1.9 2001/02/04 17:38:11 briggs Exp $ */
/*	$NetBSD: pmap.c,v 1.44 2001/06/10 11:01:27 tsubai Exp $	*/
/*	$NetBSD: powerpc_machdep.c,v 1.4 2001/04/05 09:58:05 tsubai Exp $	*/
/*	$NetBSD: process_machdep.c,v 1.5 2001/02/04 17:38:11 briggs Exp $	*/
/*	$NetBSD: sys_machdep.c,v 1.3 2000/06/09 14:08:45 kleink Exp $	*/
/*	$NetBSD: trap.c,v 1.46 2001/06/10 16:31:59 tsubai Exp $	*/
/*	$NetBSD: trap_subr.S,v 1.6 2001/06/08 00:16:25 matt Exp $	*/
/*	$NetBSD: trap_subr_mp.S,v 1.2 2001/06/10 11:09:28 tsubai Exp $	*/
/*	$NetBSD: vm_machdep.c,v 1.28 2001/06/10 11:01:28 tsubai Exp $	*/

bash-2.05# gdb my_shlib_test
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc--netbsd"...
(gdb) r -N
Starting program: /usr/local/bin/my_shlib_test -N
trap: kernel read DSI @ 0xe0895c78 by 0x185330 (DSISR 0x40000000)
Stopped in pid 89 (my_shlib_test) at    intrctlr_setpl+0x54:    lwz r10, r11, 0x44,
db>ps
PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
>How-To-Repeat:
	Unknown.
>Fix:
	Unknown.
>Release-Note:
>Audit-Trail:
>Unformatted:
 >89                88         89          0 7  0x5806    my_shlib_test
  88                87         88          0 3  0x4086              gdb    wait
  87                85         87          0 3  0x4086             bash    wait
  85                 1         85          0 3  0x4086           crunch    wait
  84                 1         84          0 3    0x84           crunch  select
  77                 1         77          0 3    0x84           crunch  select
  52                 1         52          0 3    0x84           crunch  mfsidl
  7                  0          0          0 3 0x20204         aiodoned aiodone
  6                  0          0          0 3 0x20204          ioflush  syncer
  5                  0          0          0 3 0x20204           reaper  reaper
  4                  0          0          0 3 0x20204       pagedaemon pgdaemo
  1                  0          1          0 3  0x4084           crunch    wait
  0                 -1          0          0 3 0x20204          swapper schedul
 db> show reg
 r0          0xe0895c34
 r1            0x1f92b0  ddbstk+0x1950
 r2                   0
 r3            0x20f618  extctlr
 r4                 0xb
 r5                0x53  isisize+0x7
 r6                   0
 r7            0x1f9840  ddbstk+0x1ee0
 r8              0xded4  comcnputc
 r9            0x1f0000  intstk+0xff0
 r10           0x185530  intrctlr_mask
 r11         0xe0895c34
 r12         0x44000000
 r13                  0
 r14                  0
 r15                  0
 r16                  0
 r17                  0
 r18                  0
 r19                  0
 r20                  0
 r21         0xffffec14
 r22         0xffffec08
 r23         0x219450d0
 r24         0xffffec8c
 r25         0xffffec64
 r26         0xffffec6c
 r27                  0
 r28         0x44000000
 r29           0x1f9868  ddbstk+0x1f08
 r30              0x700  tlbdsmsize+0x618
 r31           0x1f92b0  ddbstk+0x1950
 iar           0x185330  intrctlr_setpl+0x54
 msr             0x1030  tlbdsmsize+0xf48
 intrctlr_setpl+0x54:    lwz r10, r11, 0x44,
 db>ps /a
  PID          COMMAND      STRUCT PROC *            UAREA *     VMSPACE/VM_MAP
 >89     my_shlib_test          0x3542ab8         0xefb90000          0x32ed640
 
 db> examine 0x3542ab8+0x1b8	(->pcb)
  0x3542c70:     efb90000 (i.e. pcb)
 
 db> exam efb90000,4 (pcb)
 0xefb90000:     e097a480    32fa480     efb93a10    a     
                                         SP          SPL
 db> exam efb93a10
 0xefb93a10:     efb93a30
 db> exam efb93a30,6
 0xefb93a30:     efb93a50   previous R01 
 				  185cf4      
 				       0   R30        
 				    8000   R31     
 				       0   R01  (real R01 not saved?)      
 				  224758   LR 	(illegal instr. here?)
 
 trap: kernel read DSI @ 0xe0895c78 by 0x185330 (DSISR 0x40000000)
 i.e. The translation of an attempted access to 0xe0895c78 is not found in the primary 
 hash table entry group (HTEG), or in the rehashed secondary HTEG, or in the range 
 of a DBAT register (page fault condition); However
 db> exam 0xe0895c78
 0xe0895c78:       19c4a0
 
 0xefb93a50:     efb93a80    1679b8	uvm_pagelookup(uvm_page_i.h#143 after splx(s))
 0xefb93a80:     efb93ad0    17669c	uvn_findpage(uvm_vnode.c#909 after uvm_pagelookup)
 0xefb93ad0:     efb93b10    1765c0	uvn_findpages(uvm_vnode.c#886)
 0xefb93b10:     efb93c70     a2404	genfs_getpages(genfs_vnops.c#517)
 0xefb93c70:     efb93cb0    185640	
 0xefb93cb0:     efb93d00    1859b8	
 0xefb93d00:     efb93d20    159a94	uvmfault_unlockmaps(uvm_fault_i.h#73)
 0xefb93d20:     efb93d40    1599d8	ufmfault_unlockall((uvm_fault_i.h#96)
 0xefb93d40:     efb93eb0    158fc0	uvm_fault(uvm_fault.c#1778)
 0xefb93eb0:     efb93f50    17d358	trap (trap.c#187 case EXC_ISI|EXC_USER )
 0xefb93f50:     ffffeb50      578c	after trapexit
 
 
 ??? at some point the map was invalid
 bash-2.05# gdb my_shlib_test
 GNU gdb 4.17
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "powerpc--netbsd"...
 (gdb) shell sysctl -w proc.88.rlimit.datasize.soft=unlimited
 proc.88.rlimit.datasize.soft: 33554432 -> unlimited
 (gdb) shell sysctl -w proc.88.rlimit.stacksize.soft=unlimited
 proc.88.rlimit.stacksize.soft: 1048576 -> unlimited
 (gdb) shell sysctl -w proc.88.rlimit.stacksize.hard=unlimited
 proc.88.rlimit.stacksize.hard: 33554432 -> unlimited
 (gdb) r -N
 Starting program: /usr/local/bin/my_shlib_test -N
 Stopped at      cpu_Debugger+0x18:      lwz r11, r1, 0x0,
 db> ps
  PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
  99                88         99          0 4  0x5806    my_shlib_test
  88                87         88          0 3  0x4006              gdb vmmaplk
  87                85         87          0 3  0x4086             bash    wait
  85                 1         85          0 3  0x4086           crunch    wait
  84                 1         84          0 3    0x84           crunch  select
  77                 1         77          0 3    0x84           crunch  select
  52                 1         52          0 3    0x84           crunch  mfsidl
  7                  0          0          0 3 0x20204         aiodoned aiodone
  6                  0          0          0 3 0x20204          ioflush  syncer
  5                  0          0          0 3 0x20204           reaper  reaper
  4                  0          0          0 3 0x20204       pagedaemon pgdaemo
  1                  0          1          0 3  0x4084           crunch    wait
  0                 -1          0          0 3 0x20204          swapper schedul
 db> ps /w
  PID          COMMAND     EMUL  PRI UTIME STIME WAIT-MSG    WAIT-CHANNEL
  99     my_shlib_test   netbsd   51   0.2   0.3
  88               gdb   netbsd    4   0.3   0.8 vmmaplk     kernel_map_store+0x4
 
  87              bash   netbsd   32   0.0   0.2 wait         0x3542008
  85            crunch   netbsd   32   0.0   0.3 wait         0x32e8c78
  84            crunch   netbsd   24   0.0   0.0 select      selwait
  77            crunch   netbsd   24   0.4   0.0 select      selwait
  52            crunch   netbsd   32   0.0   0.2 mfsidl       0x41c9900
  7           aiodoned   netbsd    4   0.0   0.0 aiodoned    uvm+0x34
  6            ioflush   netbsd   40   0.0   0.0 syncer      rushjob
  5             reaper   netbsd    4   0.0   2.0 reaper      deadproc
  4         pagedaemon   netbsd    4   0.0   0.0 pgdaemon    uvm+0x28
  1             crunch   netbsd   32   0.0   0.1 wait         0x32e8000
  0            swapper   netbsd    4   0.0   0.0 scheduler   proc0
 db> print kernel_map_store+0x4
   1fa09c
 db> ps /a
  PID          COMMAND      STRUCT PROC *            UAREA *     VMSPACE/VM_MAP
  99     my_shlib_test          0x3542728         0xefb8f000          0x32ed258
  88               gdb          0x3542ab8         0xefb8b000          0x32ed640
  87              bash          0x3542008         0xefb7a000          0x32ed190
  85            crunch          0x32e8c78         0xefb76000          0x32ed0c8
  84            crunch          0x3542560         0xefb87000          0x32ed3e8
  77            crunch          0x35428f0         0xefb83000          0x32ed578
  52            crunch          0x3542398         0xefb7f000          0x32ed320
  7           aiodoned          0x32e8ab0         0xefb71000           0x212c90
  6            ioflush          0x32e88e8         0xefb6d000           0x212c90
  5             reaper          0x32e8720         0xefb69000           0x212c90
  4         pagedaemon          0x32e8558         0xefb65000           0x212c90
  1             crunch          0x32e8000         0xefb59000          0x32ed000
  0            swapper           0x212d58           0x266000           0x212c90
 db> show
 all             buf             object          registers       watches
 arptab          map             page            uvmexp
 breaks          ncache          pool            vnode
 db> show map
 MAP 0x1fa098: [0xe0000000->0xf0000000]
         #ent=14, sz=263806976, ref=1, version=4563, flags=0x1
         pmap=0x22470c(resident=12166)
 
 ---------------------------
 0x1fa098 is & of the kernel's vm_map structure.
 The wait channel address is 1fa09c i.e. &vm_map::lock::lk_interlock
 can do a:
 db> call wakeup(0x1fa09c) 
 db> ps
 
 db> cont
 It sleeps in lockmgr (kern_lock.c#686) - there is a ltsleep there
 682	          lkp->lk_flags |= LK_WANT_EXCL;
 683	          /*
 684	           * Wait for shared locks and upgrades to finish.
 685	           */
 686 >>>       ACQUIRE(lkp, error, extflags, 0, lkp->lk_sharecount != 0 ||
 687	                 (lkp->lk_flags & LK_WANT_UPGRADE));
 688	          lkp->lk_flags &= ~LK_WANT_EXCL;
 689	          if (error)
 
 
 Another deadlock behavior:
 bash-2.05# gdb my_shlib_test
 GNU gdb 4.17
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "powerpc--netbsd"...
 (gdb) r -N
 Starting program: /usr/local/bin/my_shlib_test -N
 Stopped at      cpu_Debugger+0x18:      lwz r11, r1, 0x0,
 db> ps
  PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
  89                88         89          0 4  0x5806    my_shlib_test
  88                87         88          0 3  0x4006              gdb uvn_fp2
  87                85         87          0 3  0x4086             bash    wait
  85                 1         85          0 3  0x4086           crunch    wait
  84                 1         84          0 3    0x84           crunch  select
  77                 1         77          0 3    0x84           crunch  select
  52                 1         52          0 3    0x84           crunch  mfsidl
  7                  0          0          0 3 0x20204         aiodoned aiodone
  6                  0          0          0 3 0x20204          ioflush  syncer
  5                  0          0          0 3 0x20204           reaper  reaper
  4                  0          0          0 3 0x20204       pagedaemon pgdaemo
  1                  0          1          0 3  0x4084           crunch    wait
  0                 -1          0          0 3 0x20204          swapper schedul
 db> cont
 
 stuck in uvn_findpage (uvm_vnode.c#946)