Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Killing a zombie process?



On Wed 14 Oct 2015 at 09:39:40 +0200, J. Hannken-Illjes wrote:
> Looks like a deadlock, two threads in tstile.
> 
> Please take a backtrace (with arguments) of these threads.

I've got a whole lot more in tstile, and that is even just from running
pkg_comp in the chroot. I didn't try to interrupt anything yet.

load averages:  0.00,  0.20,  0.44;               up 0+02:23:43        22:43:52
78 processes: 76 sleeping, 2 on CPU
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Memory: 393M Act, 60K Inact, 31M Wired, 31M Exec, 273M File, 3239M Free
Swap: 4096M Total, 4096M Free


vargaz:~$ ps alxtp1
 UID   PID  PPID   CPU PRI NI   VSZ   RSS WCHAN   STAT TTY      TIME COMMAND
1000  1391    74     0  85  0 13208  2528 wait    Is   ttyp1 0:00.02 -bash 
   0  1759  1391  1107  85  0 13304  1576 wait    I    ttyp1 0:00.13 /bin/sh /usr/pkg/sbin/pkg_comp chroot 
   0   865  1759  1107  85  0 13304  1140 wait    I    ttyp1 0:00.01 /bin/sh /pkg_comp/tmp/pkg_comp-sOjsoA.sh 
   0   874   865 13547  82  0 11088  1412 pause   I    ttyp1 0:00.01 /bin/ksh 
   0   267   874 20048  81  0 15360  1720 wait    I+   ttyp1 0:00.22 /bin/sh -e /usr/pkg/sbin/pkg_chk 
   0  9782   267 20048  81  0 15360  1448 wait    I+   ttyp1 0:00.00 sh -c cd /usr/pkgsrc/devel/mercurial && /usr/bin/make u
   0  8085  9782     0 117  0 15224  3452 tstile  D+   ttyp1 0:00.14 /usr/bin/make update CLEANDEPENDS 
   0 26889  8085 29745  78  0 15360  1424 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e; /usr/bin/env MAKECONF=/etc/mk.conf P
   0 14050 26889     0 117  0 15224  3444 tstile  D+   ttyp1 0:00.14 /usr/bin/make _MAKE OPSYS OS_VERSION LOWER_OPSYS _PKGSR
   0  6325 14050 22699  80  0 15360  1428 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e; pkgpattern=mercurial-3.5.1;\t\t\t\t 
   0 13334  6325     0 117  0 15224  3452 tstile  D+   ttyp1 0:00.14 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS HOST_OSTYPE 
   0  2892 13334 29745  78  0 15364  1444 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t\t\t\t exec 3<&0;\t\t\t\t\t
   0 13425  2892 29745  78  0 15364  1136 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t\t\t\t exec 3<&0;\t\t\t\t\t
   0 17339 13425     0 117  0 15224  3504 tstile  D+   ttyp1 0:00.16 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG
   0 11893 17339 23601  80  0 15364  1432 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e; pkgpattern=py27-mercurial\\>=3.5.1;\
   0 21797 11893     0 117  0 15228  3512 tstile  D+   ttyp1 0:00.18 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG
   0  1347 21797 23778  80  0 15364  1456 wait    I+   ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t if test -n "" &&  /usr/pkg
   0 23567  1347     0 117  0 15228  4032 tstile  D+   ttyp1 0:00.38 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG
   0  3383 23567 29360  78  0 15364  1432 wait    I+   ttyp1 0:00.00 /bin/sh -c (cd /pkg_comp/obj/pkgsrc/devel/py-mercurial/
   0 21311  3383 28277  79  0 81652 11580 wait    I+   ttyp1 0:00.14 /usr/pkg/bin/python2.7 setup.py build 
   0 24114 21311 28277  79  0 15364  1424 wait    I+   ttyp1 0:00.01 /bin/sh /pkg_comp/obj/pkgsrc/devel/py-mercurial/default
   0  3590 24114 28277  79  0 15364  1472 wait    I+   ttyp1 0:00.00 /bin/sh /usr/pkgsrc/mk/tools/msgfmt.sh 
   0  7060  3590 28277 117  0  4244   188 tstile  D+   ttyp1 0:00.00 /bin/cat 
   0 18497  3590 28277  79  0 10880  1064 pipe_wr I+   ttyp1 0:00.00 /bin/cat i18n/el.po 
   0 23883  3590     0 117  0  6580   236 netio   D+   ttyp1 0:00.00 /usr/bin/msgfmt -v -o mercurial/locale/el/LC_MESSAGES/h
   0 27257  3590 28277 117  0  4244   188 tstile  D+   ttyp1 0:00.00 /bin/cat 
   0 29472  3590 28277  79  0 14244  2344 pipe_wr I+   ttyp1 0:00.01 /usr/bin/awk -f /usr/bin/awk 

(I've re-arranged the order to get parents before children)

Here are backtraces of the processes in tstile (and the shell that
spawned the 4 leaf children). I have kept the dump so I can examine it
further.

Unfortunately, crash(8) didn't give me arguments, nor did ddb when I
tried that (I used the GENERIC kernel, what options do I need to get the
arguments?)

Script started on Wed Oct 14 23:41:43 2015
vargaz:~/crash$ crash -M netbsd.3.core -N netbsd.test
Crash version 7.0, image version 7.99.21.
WARNING: versions differ, you may not be able to examine this image.
System panicked: dump forced via kernel debugger
Backtrace from time of crash is available.


crash> bt/t 0t3590
trace: pid 3590 lid 1 at 0xfffffe8040758d00
sleepq_block() at sleepq_block+0xa2
cv_wait_sig() at cv_wait_sig+0xfe
do_sys_wait() at do_sys_wait+0x22c
sys___wait450() at sys___wait450+0x3a
syscall() at syscall+0x9c
--- syscall (number 449) ---
7f7ff683c1ea:


crash> bt/t 0t7060
trace: pid 7060 lid 1 at 0xfffffe804076c770
sleepq_block() at sleepq_block+0xa2
turnstile_block() at turnstile_block+0x40e
rw_vector_enter() at rw_vector_enter+0x2d0
genfs_lock() at genfs_lock+0x7b
VOP_LOCK() at VOP_LOCK+0x54
vn_lock() at vn_lock+0x82
nfs_lookup() at nfs_lookup+0xfb4
VOP_LOOKUP() at VOP_LOOKUP+0xa8
lookup_once() at lookup_once+0x216
namei_tryemulroot() at namei_tryemulroot+0x5b0
namei() at namei+0x29
vn_open() at vn_open+0x8e
do_open() at do_open+0x111
do_sys_openat() at do_sys_openat+0x68
sys_open() at sys_open+0x24
syscall() at syscall+0x9c
--- syscall (number 5) ---
7f7ff7c0c20a:


crash> bt/t 0t27257
trace: pid 27257 lid 1 at 0xfffffe8040748770
sleepq_block() at sleepq_block+0xa2
turnstile_block() at turnstile_block+0x40e
rw_vector_enter() at rw_vector_enter+0x2d0
genfs_lock() at genfs_lock+0x7b
VOP_LOCK() at VOP_LOCK+0x54
vn_lock() at vn_lock+0x82
nfs_lookup() at nfs_lookup+0xfb4
VOP_LOOKUP() at VOP_LOOKUP+0xa8
lookup_once() at lookup_once+0x216
namei_tryemulroot() at namei_tryemulroot+0x5b0
namei() at namei+0x29
vn_open() at vn_open+0x8e
do_open() at do_open+0x111
do_sys_openat() at do_sys_openat+0x68
sys_open() at sys_open+0x24
syscall() at syscall+0x9c
--- syscall (number 5) ---
7f7ff7c0c20a:


crash> bt/t 0t23567
trace: pid 23567 lid 1 at 0xfffffe8040734c60
sleepq_block() at sleepq_block+0xa2
turnstile_block() at turnstile_block+0x40e
rw_vector_enter() at rw_vector_enter+0x2d0
genfs_lock() at genfs_lock+0x7b
VOP_LOCK() at VOP_LOCK+0x54
vn_lock() at vn_lock+0x82
getcwd_common() at getcwd_common+0x2cd
sys___getcwd() at sys___getcwd+0xae
syscall() at syscall+0x9c
--- syscall (number 296) ---
7f7ff6c9f6ba:


crash> bt/t 0t21797
trace: pid 21797 lid 1 at 0xfffffe804073cc60
sleepq_block() at sleepq_block+0xa2
turnstile_block() at turnstile_block+0x40e
rw_vector_enter() at rw_vector_enter+0x2d0
genfs_lock() at genfs_lock+0x7b
VOP_LOCK() at VOP_LOCK+0x54
vn_lock() at vn_lock+0x82
getcwd_common() at getcwd_common+0x2cd
sys___getcwd() at sys___getcwd+0xae
syscall() at syscall+0x9c
--- syscall (number 296) ---
7f7ff6c9f6ba:


crash> bt/t 0t17339
trace: pid 17339 lid 1 at 0xfffffe80407a4c60
sleepq_block() at sleepq_block+0xa2
turnstile_block() at turnstile_block+0x40e
rw_vector_enter() at rw_vector_enter+0x2d0
genfs_lock() at genfs_lock+0x7b
VOP_LOCK() at VOP_LOCK+0x54
vn_lock() at vn_lock+0x82
getcwd_common() at getcwd_common+0x2cd
sys___getcwd() at sys___getcwd+0xae
syscall() at syscall+0x9c
--- syscall (number 296) ---
7f7ff6c9f6ba:


crash> bt/t 0t13334
trace: pid 13334 lid 1 at 0xfffffe80406b0c60
sleepq_block() at sleepq_block+0xa2
turnstile_block() at turnstile_block+0x40e
rw_vector_enter() at rw_vector_enter+0x2d0
genfs_lock() at genfs_lock+0x7b
VOP_LOCK() at VOP_LOCK+0x54
vn_lock() at vn_lock+0x82
getcwd_common() at getcwd_common+0x2cd
sys___getcwd() at sys___getcwd+0xae
syscall() at syscall+0x9c
--- syscall (number 296) ---
7f7ff6c9f6ba:


crash> bt/t 0t14050
trace: pid 14050 lid 1 at 0xfffffe8040778c60
sleepq_block() at sleepq_block+0xa2
turnstile_block() at turnstile_block+0x40e
rw_vector_enter() at rw_vector_enter+0x2d0
genfs_lock() at genfs_lock+0x7b
VOP_LOCK() at VOP_LOCK+0x54
vn_lock() at vn_lock+0x82
getcwd_common() at getcwd_common+0x2cd
sys___getcwd() at sys___getcwd+0xae
syscall() at syscall+0x9c
--- syscall (number 296) ---
7f7ff6c9f6ba:


crash> bt/t 0t8085
trace: pid 8085 lid 1 at 0xfffffe80406ecc60
sleepq_block() at sleepq_block+0xa2
turnstile_block() at turnstile_block+0x40e
rw_vector_enter() at rw_vector_enter+0x2d0
genfs_lock() at genfs_lock+0x7b
VOP_LOCK() at VOP_LOCK+0x54
vn_lock() at vn_lock+0x82
getcwd_common() at getcwd_common+0x2cd
sys___getcwd() at sys___getcwd+0xae
syscall() at syscall+0x9c
--- syscall (number 296) ---
7f7ff6c9f6ba:
crash> ^Dvargaz:~/crash$ exit

Script done on Wed Oct 14 23:48:44 2015

Note the complicated mount points, which might make any bugs in locking
more likely to pop up: the usual null mounts from pkg_comp, but with an
additional mount of a local directory for actually building in (so that
that doesn't need to go over NFS).

/dev/wd0a on / type ffs (local)
/dev/wd0f on /var type ffs (log, local)
/dev/wd0e on /usr type ffs (log, local)
/dev/wd0g on /home type ffs (log, local)
/dev/wd0h on /tmp type ffs (log, local)
kernfs on /kern type kernfs (local)
ptyfs on /dev/pts type ptyfs (local)
procfs on /proc type procfs (local)
procfs on /usr/pkg/emul/linux32/proc type procfs (read-only, local)
nfsserver:/mnt/vol1 on /mnt/vol1 type nfs
nfsserver:/mnt/scratch on /mnt/scratch type nfs
tmpfs on /var/shm type tmpfs (local)
/mnt/vol1/rhialto/cvs/src on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/usr/src type null (read-only)
/mnt/vol1/rhialto/cvs/pkgsrc on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/usr/pkgsrc type null (read-only)
/mnt/vol1/distfiles on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/distfiles type null
/mnt/scratch/scratch/packages.amd64-7.0 on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/packages type null
/home/rhialto/obj on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/obj/pkgsrc type null (local)
procfs on /usr/pkg/emul/linux32/proc type procfs (local)

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl    -- 'this bath is too hot.'

Attachment: pgpvUuMeHpTqj.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index