Subject: kern/24410: Deadlock in sys_generic.c/kern_synch.c
To: None <gnats-bugs@gnats.netbsd.org>
From: Christian Biere <christianbiere@gmx.de>
List: netbsd-bugs
Date: 02/13/2004 08:24:35
>Number:         24410
>Category:       kern
>Synopsis:       Deadlock in sys_generic.c/kern_synch.c
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 13 08:25:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Christian Biere
>Release:        NetBSD 1.6ZJ
>Organization:
>Environment:
System: NetBSD Cyclonus 1.6ZJ NetBSD 1.6ZJ (STARSCREAM) #3: Wed Feb 11 11:27:20 CET 2004 root@Cyclonus:/usr/src/sys/arch/i386/compile/STARSCREAM i386
Architecture: i386
Machine: i386

     $NetBSD: kern_sa.c,v 1.47 2004/01/02 18:52:17 cl Exp $
     $NetBSD: kern_synch.c,v 1.140 2004/01/04 13:27:53 kleink Exp $
     $NetBSD: sys_generic.c,v 1.80 2003/10/10 15:24:28 chs Exp $
>Description:

Currently my system locks up about once in 24hrs mostly triggered by
leaving the box alone and accessing big files when coming back like
burning a CD or verifying a checksum.

bt
cpu_Debugger(c0320d80,c71f5424,c6c6ce5c,c01ab6e8,c02f51f4) at netbsd:cpu_Debugge

r+0x4
comintr(c09c9c00,c,c6c60010,c01a0030,c02f0010) at netbsd:comintr+0x5f9
Xintr_ioapic_edge4() at netbsd:Xintr_ioapic_edge4+0x92
--- interrupt ---
netbsd:ffs_genfsops:
db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT

 26984         7348    26984       1002 2  0x4002    1             bash   ttyin
 7348          4682     7348       1002 2  0x4100    1            aterm  select
 6559         15405     6559       1002 2  0x4002    1          openssl uvn_fp2
 15405         9905    15405       1002 2  0x4002    1             bash    wait
 9905          4682     9905       1002 2  0x4100    1            aterm  select
 23919         8123    23919       1000 2  0x400a    1             mutt    poll
 6043          1648     6043       1002 2  0x4002    1            links  select
 8123             1     8123       1000 2  0x4002    1             bash    wait
 8554             1     8554          0 2  0x4002    1             bash   ttyin
 1648          6235     1648       1002 2  0x4002    1             bash    wait
 6235          4682     6235       1002 2  0x4101    1            aterm  select
>How-To-Repeat:

Run X, run Bittorrent
Come back after 8-20 hrs.
Run Mozilla.
Run md5 A_BIG_FILE.

>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
 >29663         7441     7441       1002 2  0x4402    2        python2p2       *
  7441           353     7441       1002 2  0x4002    1               sh    wait
  23977            1    23977          0 2  0x4002    1            getty   ttyin
  1771             1     1771          0 2  0x4002    1            getty   ttyin
  4697           359      359       1002 2  0x4400    3 MozillaFirebird-       *
  359           4682      359       1002 2  0x4000    1               sh    wait
  353           4879      353       1002 2  0x4002    1             bash    wait
  4879          4682     4879       1002 2  0x4101    1            aterm  select
  4682          4917     4917       1002 2  0x4008    1         blackbox  select
  460           4917     4917       1002 2  0x4000    1          gkrellm    poll
  290           4917     4917       1002 2  0x4008    1           bbkeys  select
  4917          4778     4917       1002 2  0x4000    1               sh    wait
  294           4778      294       1002 2  0x4000    1          XFree86  select
  4778             1     4754       1002 2  0x4000    1            xinit    wait
  4462             1     4462       1015 2   0x100    1          syslogd    poll
  4603             1     4603       1007 2   0x100    1             ntpd   pause
  96               1       96          0 2  0x4002    1            getty   ttyin
  4561             1     4561          0 2       0    1             cron nanosle
  236              1      236          0 2       0    1        mount_mfs  mfsidl
  4144             1     4144          0 2       0    1        mount_mfs  mfsidl
  11               1       11          0 2       0    1        mount_mfs  mfsidl
  9                0        0          0 2 0x20200    1         aiodoned aiodone
  8                0        0          0 2 0x20200    1          ioflush  syncer
  7                0        0          0 2 0x20200    1       pagedaemon pgdaemo
  6                0        0          0 2 0x20200    1        atapibus0  sccomp
  5                0        0          0 2 0x20200    1          atabus1   atath
  4                0        0          0 2 0x20200    1          atabus0   atath
  3                0        0          0 2 0x20200    1             pms0 pmsrese
  2                0        0          0 2 0x20200    1           sysmon smtaskq
  1                0        1          0 2  0x4000    1             init    wait
  0               -1        0          0 2 0x20200    1          swapper schedul
 
 db> cont
 Stopped in pid 29663.7 (python2p2) at   netbsd:cpu_Debugger+0x4:        leave
 db> bt
 cpu_Debugger(0,3f9,302c5f8d,7fe,c09cc000) at netbsd:cpu_Debugger+0x4
 comintr(c09c9c00,c,10,c7200030,c71f0010) at netbsd:comintr+0x5f9
 Bad frame pointer: 0xc09c8780
 db> sync
 syncing disks... 
 simple_lock: lock held
 lock: 0xc02f51f4, currently at: ../../../../kern/sys_generic.c:981
 last locked: ../../../../kern/kern_synch.c:421
 last unlocked: ../../../../kern/kern_sa.c:867
 selwakeup(c03211ac,3fd,0,d,c72071d4) at netbsd:selwakeup+0xa1
 logwakeup(c02ca7ca,5,0,0,c6c6cac0) at netbsd:logwakeup+0x9d
 printf(c02ca7ca,0,c6c6cae4,c013cafa,100) at netbsd:printf+0x75
 vfs_shutdown(d,30,c6c60010,c02b8560,d) at netbsd:vfs_shutdown+0x31
 cpu_reboot(100,0,c6c6cbc4,c01712ff,30) at netbsd:cpu_reboot+0x18a
 db_sync_cmd(30,0,168e14,c6c6cb2c,10) at netbsd:db_sync_cmd+0x24
 db_command(c02fd510,c02b8560,c023dcf8,c022ff74,d) at netbsd:db_command+0xef
 db_command_loop(c022ff74,73df,7,c720735d,0) at netbsd:db_command_loop+0x8c
 db_trap(1,0,c0170ca0,c0304920,c022ff74) at netbsd:db_trap+0xdd
 kdb_trap(1,0,c6c6cd80,1,1) at netbsd:kdb_trap+0x12f
 trap() at netbsd:trap+0xda
 --- trap (number 1) ---
 cpu_Debugger(0,3f9,302c5f8d,7fe,c09cc000) at netbsd:cpu_Debugger+0x4
 comintr(c09c9c00,c,10,c7200030,c71f0010) at netbsd:comintr+0x5f9
 Bad frame pointer: 0xc09c8780
 ~~wdc_atapi_intr: unknown phase 0x1
 done
 unmounting /c (/dev/cgd0a)...
 panic: ltsleep: l_stat 8 != LSONPROC
 Stopped in pid 29663.7 (python2p2) at   netbsd:cpu_Debugger+0x4:        leave
 db> sync
 
 dumping to dev 0,1 offset 9095
 dump panic: wddump: polled command has been queued
 Stopped in pid 29663.7 (python2p2) at   netbsd:cpu_Debugger+0x4:        leave
 db> sync
 
 dumping to dev 0,1 offset 9095
 dump device not ready
 
 
 panic: wdc_exec_command: polled command not done
 Stopped in pid 29663.7 (python2p2) at   netbsd:cpu_Debugger+0x4:        leave
 db> sync
 
 dumping to dev 0,1 offset 9095
 dump device not ready
 
 
 panic: kernel diagnostic assertion "_simple_lock_held((&sched_lock)) == 0" 
 failed: file "../../../../kern/kern_synch.c", line 679
 Stopped in pid 29663.7 (python2p2) at   netbsd:cpu_Debugger+0x4:        leave
 db> reboot
 rebooting...