Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Killing a zombie process?



On 15 Oct 2015, at 00:21, Rhialto <rhialto%falu.nl@localhost> wrote:

> On Wed 14 Oct 2015 at 09:39:40 +0200, J. Hannken-Illjes wrote:
>> Looks like a deadlock, two threads in tstile.
>> 
>> Please take a backtrace (with arguments) of these threads.
> 
> I've got a whole lot more in tstile, and that is even just from running
> pkg_comp in the chroot. I didn't try to interrupt anything yet.
> 
> load averages:  0.00,  0.20,  0.44;               up 0+02:23:43        22:43:52
> 78 processes: 76 sleeping, 2 on CPU
> CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Memory: 393M Act, 60K Inact, 31M Wired, 31M Exec, 273M File, 3239M Free
> Swap: 4096M Total, 4096M Free
> 
> 
> vargaz:~$ ps alxtp1
> UID   PID  PPID   CPU PRI NI   VSZ   RSS WCHAN   STAT TTY      TIME COMMAND
> 1000  1391    74     0  85  0 13208  2528 wait    Is   ttyp1 0:00.02 -bash
>   0  1759  1391  1107  85  0 13304  1576 wait    I    ttyp1 0:00.13 /bin/sh /usr/pkg/sbin/pkg_comp chroot
>   0   865  1759  1107  85  0 13304  1140 wait    I    ttyp1 0:00.01 /bin/sh /pkg_comp/tmp/pkg_comp-sOjsoA.sh
>   0   874   865 13547  82  0 11088  1412 pause   I    ttyp1 0:00.01 /bin/ksh
>   0   267   874 20048  81  0 15360  1720 wait    I+   ttyp1 0:00.22 /bin/sh -e /usr/pkg/sbin/pkg_chk
>   0  9782   267 20048  81  0 15360  1448 wait    I+   ttyp1 0:00.00 sh -c cd /usr/pkgsrc/devel/mercurial && /usr/bin/make u
>   0  8085  9782     0 117  0 15224  3452 tstile  D+   ttyp1 0:00.14 /usr/bin/make update CLEANDEPENDS
>   0 26889  8085 29745  78  0 15360  1424 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e; /usr/bin/env MAKECONF=/etc/mk.conf P
>   0 14050 26889     0 117  0 15224  3444 tstile  D+   ttyp1 0:00.14 /usr/bin/make _MAKE OPSYS OS_VERSION LOWER_OPSYS _PKGSR
>   0  6325 14050 22699  80  0 15360  1428 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e; pkgpattern=mercurial-3.5.1;\t\t\t\t
>   0 13334  6325     0 117  0 15224  3452 tstile  D+   ttyp1 0:00.14 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS HOST_OSTYPE
>   0  2892 13334 29745  78  0 15364  1444 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t\t\t\t exec 3<&0;\t\t\t\t\t
>   0 13425  2892 29745  78  0 15364  1136 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t\t\t\t exec 3<&0;\t\t\t\t\t
>   0 17339 13425     0 117  0 15224  3504 tstile  D+   ttyp1 0:00.16 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG
>   0 11893 17339 23601  80  0 15364  1432 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e; pkgpattern=py27-mercurial\\>=3.5.1;\
>   0 21797 11893     0 117  0 15228  3512 tstile  D+   ttyp1 0:00.18 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG
>   0  1347 21797 23778  80  0 15364  1456 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t if test -n "" &&  /usr/pkg
>   0 23567  1347     0 117  0 15228  4032 tstile  D+   ttyp1 0:00.38 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG
>   0  3383 23567 29360  78  0 15364  1432 wait    I+   ttyp1 0:00.00 /bin/sh -c (cd /pkg_comp/obj/pkgsrc/devel/py-mercurial/
>   0 21311  3383 28277  79  0 81652 11580 wait    I+   ttyp1 0:00.14 /usr/pkg/bin/python2.7 setup.py build
>   0 24114 21311 28277  79  0 15364  1424 wait    I+   ttyp1 0:00.01 /bin/sh /pkg_comp/obj/pkgsrc/devel/py-mercurial/default
>   0  3590 24114 28277  79  0 15364  1472 wait    I+   ttyp1 0:00.00 /bin/sh /usr/pkgsrc/mk/tools/msgfmt.sh
>   0  7060  3590 28277 117  0  4244   188 tstile  D+   ttyp1 0:00.00 /bin/cat
>   0 18497  3590 28277  79  0 10880  1064 pipe_wr I+   ttyp1 0:00.00 /bin/cat i18n/el.po
>   0 23883  3590     0 117  0  6580   236 netio   D+   ttyp1 0:00.00 /usr/bin/msgfmt -v -o mercurial/locale/el/LC_MESSAGES/h
>   0 27257  3590 28277 117  0  4244   188 tstile  D+   ttyp1 0:00.00 /bin/cat
>   0 29472  3590 28277  79  0 14244  2344 pipe_wr I+   ttyp1 0:00.01 /usr/bin/awk -f /usr/bin/awk
> 
> (I've re-arranged the order to get parents before children)
> 
> Here are backtraces of the processes in tstile (and the shell that
> spawned the 4 leaf children). I have kept the dump so I can examine it
> further.
> 
> Unfortunately, crash(8) didn't give me arguments, nor did ddb when I
> tried that (I used the GENERIC kernel, what options do I need to get the
> arguments?)
> 
> Script started on Wed Oct 14 23:41:43 2015
> vargaz:~/crash$ crash -M netbsd.3.core -N netbsd.test
> Crash version 7.0, image version 7.99.21.
> WARNING: versions differ, you may not be able to examine this image.
> System panicked: dump forced via kernel debugger
> Backtrace from time of crash is available.
> 
> 
> crash> bt/t 0t3590
> trace: pid 3590 lid 1 at 0xfffffe8040758d00
> sleepq_block() at sleepq_block+0xa2
> cv_wait_sig() at cv_wait_sig+0xfe
> do_sys_wait() at do_sys_wait+0x22c
> sys___wait450() at sys___wait450+0x3a
> syscall() at syscall+0x9c
> --- syscall (number 449) ---
> 7f7ff683c1ea:
> 
> 
> crash> bt/t 0t7060
> trace: pid 7060 lid 1 at 0xfffffe804076c770
> sleepq_block() at sleepq_block+0xa2
> turnstile_block() at turnstile_block+0x40e
> rw_vector_enter() at rw_vector_enter+0x2d0
> genfs_lock() at genfs_lock+0x7b
> VOP_LOCK() at VOP_LOCK+0x54
> vn_lock() at vn_lock+0x82
> nfs_lookup() at nfs_lookup+0xfb4
> VOP_LOOKUP() at VOP_LOOKUP+0xa8
> lookup_once() at lookup_once+0x216
> namei_tryemulroot() at namei_tryemulroot+0x5b0
> namei() at namei+0x29
> vn_open() at vn_open+0x8e
> do_open() at do_open+0x111
> do_sys_openat() at do_sys_openat+0x68
> sys_open() at sys_open+0x24
> syscall() at syscall+0x9c
> --- syscall (number 5) ---
> 7f7ff7c0c20a:
> 
> 
> crash> bt/t 0t27257
> trace: pid 27257 lid 1 at 0xfffffe8040748770
> sleepq_block() at sleepq_block+0xa2
> turnstile_block() at turnstile_block+0x40e
> rw_vector_enter() at rw_vector_enter+0x2d0
> genfs_lock() at genfs_lock+0x7b
> VOP_LOCK() at VOP_LOCK+0x54
> vn_lock() at vn_lock+0x82
> nfs_lookup() at nfs_lookup+0xfb4
> VOP_LOOKUP() at VOP_LOOKUP+0xa8
> lookup_once() at lookup_once+0x216
> namei_tryemulroot() at namei_tryemulroot+0x5b0
> namei() at namei+0x29
> vn_open() at vn_open+0x8e
> do_open() at do_open+0x111
> do_sys_openat() at do_sys_openat+0x68
> sys_open() at sys_open+0x24
> syscall() at syscall+0x9c
> --- syscall (number 5) ---
> 7f7ff7c0c20a:
> 
> 
> crash> bt/t 0t23567
> trace: pid 23567 lid 1 at 0xfffffe8040734c60
> sleepq_block() at sleepq_block+0xa2
> turnstile_block() at turnstile_block+0x40e
> rw_vector_enter() at rw_vector_enter+0x2d0
> genfs_lock() at genfs_lock+0x7b
> VOP_LOCK() at VOP_LOCK+0x54
> vn_lock() at vn_lock+0x82
> getcwd_common() at getcwd_common+0x2cd
> sys___getcwd() at sys___getcwd+0xae
> syscall() at syscall+0x9c
> --- syscall (number 296) ---
> 7f7ff6c9f6ba:
> 
> 
> crash> bt/t 0t21797
> trace: pid 21797 lid 1 at 0xfffffe804073cc60
> sleepq_block() at sleepq_block+0xa2
> turnstile_block() at turnstile_block+0x40e
> rw_vector_enter() at rw_vector_enter+0x2d0
> genfs_lock() at genfs_lock+0x7b
> VOP_LOCK() at VOP_LOCK+0x54
> vn_lock() at vn_lock+0x82
> getcwd_common() at getcwd_common+0x2cd
> sys___getcwd() at sys___getcwd+0xae
> syscall() at syscall+0x9c
> --- syscall (number 296) ---
> 7f7ff6c9f6ba:
> 
> 
> crash> bt/t 0t17339
> trace: pid 17339 lid 1 at 0xfffffe80407a4c60
> sleepq_block() at sleepq_block+0xa2
> turnstile_block() at turnstile_block+0x40e
> rw_vector_enter() at rw_vector_enter+0x2d0
> genfs_lock() at genfs_lock+0x7b
> VOP_LOCK() at VOP_LOCK+0x54
> vn_lock() at vn_lock+0x82
> getcwd_common() at getcwd_common+0x2cd
> sys___getcwd() at sys___getcwd+0xae
> syscall() at syscall+0x9c
> --- syscall (number 296) ---
> 7f7ff6c9f6ba:
> 
> 
> crash> bt/t 0t13334
> trace: pid 13334 lid 1 at 0xfffffe80406b0c60
> sleepq_block() at sleepq_block+0xa2
> turnstile_block() at turnstile_block+0x40e
> rw_vector_enter() at rw_vector_enter+0x2d0
> genfs_lock() at genfs_lock+0x7b
> VOP_LOCK() at VOP_LOCK+0x54
> vn_lock() at vn_lock+0x82
> getcwd_common() at getcwd_common+0x2cd
> sys___getcwd() at sys___getcwd+0xae
> syscall() at syscall+0x9c
> --- syscall (number 296) ---
> 7f7ff6c9f6ba:
> 
> 
> crash> bt/t 0t14050
> trace: pid 14050 lid 1 at 0xfffffe8040778c60
> sleepq_block() at sleepq_block+0xa2
> turnstile_block() at turnstile_block+0x40e
> rw_vector_enter() at rw_vector_enter+0x2d0
> genfs_lock() at genfs_lock+0x7b
> VOP_LOCK() at VOP_LOCK+0x54
> vn_lock() at vn_lock+0x82
> getcwd_common() at getcwd_common+0x2cd
> sys___getcwd() at sys___getcwd+0xae
> syscall() at syscall+0x9c
> --- syscall (number 296) ---
> 7f7ff6c9f6ba:
> 
> 
> crash> bt/t 0t8085
> trace: pid 8085 lid 1 at 0xfffffe80406ecc60
> sleepq_block() at sleepq_block+0xa2
> turnstile_block() at turnstile_block+0x40e
> rw_vector_enter() at rw_vector_enter+0x2d0
> genfs_lock() at genfs_lock+0x7b
> VOP_LOCK() at VOP_LOCK+0x54
> vn_lock() at vn_lock+0x82
> getcwd_common() at getcwd_common+0x2cd
> sys___getcwd() at sys___getcwd+0xae
> syscall() at syscall+0x9c
> --- syscall (number 296) ---
> 7f7ff6c9f6ba:
> crash> ^Dvargaz:~/crash$ exit
> 
> Script done on Wed Oct 14 23:48:44 2015
> 
> Note the complicated mount points, which might make any bugs in locking
> more likely to pop up: the usual null mounts from pkg_comp, but with an
> additional mount of a local directory for actually building in (so that
> that doesn't need to go over NFS).
> 
> /dev/wd0a on / type ffs (local)
> /dev/wd0f on /var type ffs (log, local)
> /dev/wd0e on /usr type ffs (log, local)
> /dev/wd0g on /home type ffs (log, local)
> /dev/wd0h on /tmp type ffs (log, local)
> kernfs on /kern type kernfs (local)
> ptyfs on /dev/pts type ptyfs (local)
> procfs on /proc type procfs (local)
> procfs on /usr/pkg/emul/linux32/proc type procfs (read-only, local)
> nfsserver:/mnt/vol1 on /mnt/vol1 type nfs
> nfsserver:/mnt/scratch on /mnt/scratch type nfs
> tmpfs on /var/shm type tmpfs (local)
> /mnt/vol1/rhialto/cvs/src on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/usr/src type null (read-only)
> /mnt/vol1/rhialto/cvs/pkgsrc on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/usr/pkgsrc type null (read-only)
> /mnt/vol1/distfiles on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/distfiles type null
> /mnt/scratch/scratch/packages.amd64-7.0 on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/packages type null
> /home/rhialto/obj on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/obj/pkgsrc type null (local)
> procfs on /usr/pkg/emul/linux32/proc type procfs (local)

Examining the crash dump further I get:

Threads 23567, 21797, 17339, 13334, 14050 and 8085 want vnode 0xfffffe81318ae748.
Thread 7060 holds vnode 0xfffffe81318ae748 and wants vnode 0xfffffe811c0fce50.
Thread 27257 holds vnode 0xfffffe811c0fce50 and wants vnode 0xfffffe811c0fc760.

Vnode 0xfffffe811c0fc760 is (v_size 55295, v_flag VV_MAPPED | VI_EXECMAP, v_type = VREG, v_tag = VT_NFS on /mnt/scratch).

Thread 23883 holds vnode 0xfffffe811c0fc760, its trace is

 0 mi_switch (l=l@entry=0xfffffe810907ab60)
 1 sleepq_block (timo=timo@entry=500, catch_p=catch_p@entry=false)
 2 cv_timedwait (cv=cv@entry=0xfffffe8135dfbcf8, mtx=mtx@entry=0xfffffe813fdaaf40, timo=500)
 3 sbwait (sb=sb@entry=0xfffffe8135dfbcb0)
 4 soreceive (so=0xfffffe8135dfbb68, paddr=0xfffffe8040710b60, uio=0xfffffe8040710b98, mp0=<optimized out>, controlp=0x0, flagsp=0xfffffe8040710b2c)
 5 nfs_receive (l=0xfffffe810907ab60, mp=0xfffffe8040710b58, aname=0xfffffe8040710b60, rep=0xfffffe813eef1cf0)
 6 nfs_reply (lwp=0xfffffe810907ab60, myrep=0xfffffe813eef1cf0)
 7 nfs_request (np=np@entry=0xfffffe812bc3a6a0, mrest=mrest@entry=0xfffffe811a75e000, procnum=procnum@entry=1, lwp=0xfffffe810907ab60, cred=0xfffffe8124face40, mrp=mrp@entry=0xfffffe8040710c50, mdp=mdp@entry=0xfffffe8040710c58, dposp=dposp@entry=0xfffffe8040710c48, rexmitp=rexmitp@entry=0x0)
 8 nfs_getattr (v=0xfffffe8040710cb0)
 9 VOP_GETATTR (vp=vp@entry=0xfffffe811c0fc760, vap=vap@entry=0xfffffe8040710ce8, cred=<optimized out>)
10 vn_stat (vp=vp@entry=0xfffffe811c0fc760, sb=sb@entry=0xfffffe8040710e00)
11 vn_statfile (fp=<optimized out>, sb=0xfffffe8040710e00)
12 do_sys_fstat (fd=4, sb=sb@entry=0xfffffe8040710e00)
13 sys___fstat50 (l=<optimized out>, uap=0xfffffe8040710f00, retval=<optimized out>)
14 sy_call (rval=0xfffffe8040710eb8, uap=0xfffffe8040710f00, l=0xfffffe810907ab60, sy=0xffffffff81108c60 <sysent+10560>)
15 sy_invoke (code=440, rval=0xfffffe8040710eb8, uap=0xfffffe8040710f00, l=0xfffffe810907ab60, sy=0xffffffff81108c60 <sysent+10560>)
16 syscall (frame=0xfffffe8040710f00)
17 Xsyscall ()

Looks like we are waiting for a NFS operation to complete.

Did the machine hang here?

--
J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail



Home | Main Index | Thread Index | Old Index