On 15 Oct 2015, at 00:21, Rhialto <rhialto%falu.nl@localhost> wrote: > On Wed 14 Oct 2015 at 09:39:40 +0200, J. Hannken-Illjes wrote: >> Looks like a deadlock, two threads in tstile. >> >> Please take a backtrace (with arguments) of these threads. > > I've got a whole lot more in tstile, and that is even just from running > pkg_comp in the chroot. I didn't try to interrupt anything yet. > > load averages: 0.00, 0.20, 0.44; up 0+02:23:43 22:43:52 > 78 processes: 76 sleeping, 2 on CPU > CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > Memory: 393M Act, 60K Inact, 31M Wired, 31M Exec, 273M File, 3239M Free > Swap: 4096M Total, 4096M Free > > > vargaz:~$ ps alxtp1 > UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND > 1000 1391 74 0 85 0 13208 2528 wait Is ttyp1 0:00.02 -bash > 0 1759 1391 1107 85 0 13304 1576 wait I ttyp1 0:00.13 /bin/sh /usr/pkg/sbin/pkg_comp chroot > 0 865 1759 1107 85 0 13304 1140 wait I ttyp1 0:00.01 /bin/sh /pkg_comp/tmp/pkg_comp-sOjsoA.sh > 0 874 865 13547 82 0 11088 1412 pause I ttyp1 0:00.01 /bin/ksh > 0 267 874 20048 81 0 15360 1720 wait I+ ttyp1 0:00.22 /bin/sh -e /usr/pkg/sbin/pkg_chk > 0 9782 267 20048 81 0 15360 1448 wait I+ ttyp1 0:00.00 sh -c cd /usr/pkgsrc/devel/mercurial && /usr/bin/make u > 0 8085 9782 0 117 0 15224 3452 tstile D+ ttyp1 0:00.14 /usr/bin/make update CLEANDEPENDS > 0 26889 8085 29745 78 0 15360 1424 wait I+ ttyp1 0:00.00 /bin/sh -c set -e; /usr/bin/env MAKECONF=/etc/mk.conf P > 0 14050 26889 0 117 0 15224 3444 tstile D+ ttyp1 0:00.14 /usr/bin/make _MAKE OPSYS OS_VERSION LOWER_OPSYS _PKGSR > 0 6325 14050 22699 80 0 15360 1428 wait I+ ttyp1 0:00.00 /bin/sh -c set -e; pkgpattern=mercurial-3.5.1;\t\t\t\t > 0 13334 6325 0 117 0 15224 3452 tstile D+ ttyp1 0:00.14 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS HOST_OSTYPE > 0 2892 13334 29745 78 0 15364 1444 wait I+ ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t\t\t\t exec 3<&0;\t\t\t\t\t > 0 13425 2892 29745 78 0 15364 1136 wait I+ ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t\t\t\t exec 3<&0;\t\t\t\t\t > 0 17339 13425 0 117 0 15224 3504 tstile D+ ttyp1 0:00.16 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG > 0 11893 17339 23601 80 0 15364 1432 wait I+ ttyp1 0:00.00 /bin/sh -c set -e; pkgpattern=py27-mercurial\\>=3.5.1;\ > 0 21797 11893 0 117 0 15228 3512 tstile D+ ttyp1 0:00.18 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG > 0 1347 21797 23778 80 0 15364 1456 wait I+ ttyp1 0:00.00 /bin/sh -c set -e;\t\t\t\t\t if test -n "" && /usr/pkg > 0 23567 1347 0 117 0 15228 4032 tstile D+ ttyp1 0:00.38 /usr/bin/make .MAKE.LEVEL.ENV CLEANDEPENDS DEPENDS_TARG > 0 3383 23567 29360 78 0 15364 1432 wait I+ ttyp1 0:00.00 /bin/sh -c (cd /pkg_comp/obj/pkgsrc/devel/py-mercurial/ > 0 21311 3383 28277 79 0 81652 11580 wait I+ ttyp1 0:00.14 /usr/pkg/bin/python2.7 setup.py build > 0 24114 21311 28277 79 0 15364 1424 wait I+ ttyp1 0:00.01 /bin/sh /pkg_comp/obj/pkgsrc/devel/py-mercurial/default > 0 3590 24114 28277 79 0 15364 1472 wait I+ ttyp1 0:00.00 /bin/sh /usr/pkgsrc/mk/tools/msgfmt.sh > 0 7060 3590 28277 117 0 4244 188 tstile D+ ttyp1 0:00.00 /bin/cat > 0 18497 3590 28277 79 0 10880 1064 pipe_wr I+ ttyp1 0:00.00 /bin/cat i18n/el.po > 0 23883 3590 0 117 0 6580 236 netio D+ ttyp1 0:00.00 /usr/bin/msgfmt -v -o mercurial/locale/el/LC_MESSAGES/h > 0 27257 3590 28277 117 0 4244 188 tstile D+ ttyp1 0:00.00 /bin/cat > 0 29472 3590 28277 79 0 14244 2344 pipe_wr I+ ttyp1 0:00.01 /usr/bin/awk -f /usr/bin/awk > > (I've re-arranged the order to get parents before children) > > Here are backtraces of the processes in tstile (and the shell that > spawned the 4 leaf children). I have kept the dump so I can examine it > further. > > Unfortunately, crash(8) didn't give me arguments, nor did ddb when I > tried that (I used the GENERIC kernel, what options do I need to get the > arguments?) > > Script started on Wed Oct 14 23:41:43 2015 > vargaz:~/crash$ crash -M netbsd.3.core -N netbsd.test > Crash version 7.0, image version 7.99.21. > WARNING: versions differ, you may not be able to examine this image. > System panicked: dump forced via kernel debugger > Backtrace from time of crash is available. > > > crash> bt/t 0t3590 > trace: pid 3590 lid 1 at 0xfffffe8040758d00 > sleepq_block() at sleepq_block+0xa2 > cv_wait_sig() at cv_wait_sig+0xfe > do_sys_wait() at do_sys_wait+0x22c > sys___wait450() at sys___wait450+0x3a > syscall() at syscall+0x9c > --- syscall (number 449) --- > 7f7ff683c1ea: > > > crash> bt/t 0t7060 > trace: pid 7060 lid 1 at 0xfffffe804076c770 > sleepq_block() at sleepq_block+0xa2 > turnstile_block() at turnstile_block+0x40e > rw_vector_enter() at rw_vector_enter+0x2d0 > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x54 > vn_lock() at vn_lock+0x82 > nfs_lookup() at nfs_lookup+0xfb4 > VOP_LOOKUP() at VOP_LOOKUP+0xa8 > lookup_once() at lookup_once+0x216 > namei_tryemulroot() at namei_tryemulroot+0x5b0 > namei() at namei+0x29 > vn_open() at vn_open+0x8e > do_open() at do_open+0x111 > do_sys_openat() at do_sys_openat+0x68 > sys_open() at sys_open+0x24 > syscall() at syscall+0x9c > --- syscall (number 5) --- > 7f7ff7c0c20a: > > > crash> bt/t 0t27257 > trace: pid 27257 lid 1 at 0xfffffe8040748770 > sleepq_block() at sleepq_block+0xa2 > turnstile_block() at turnstile_block+0x40e > rw_vector_enter() at rw_vector_enter+0x2d0 > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x54 > vn_lock() at vn_lock+0x82 > nfs_lookup() at nfs_lookup+0xfb4 > VOP_LOOKUP() at VOP_LOOKUP+0xa8 > lookup_once() at lookup_once+0x216 > namei_tryemulroot() at namei_tryemulroot+0x5b0 > namei() at namei+0x29 > vn_open() at vn_open+0x8e > do_open() at do_open+0x111 > do_sys_openat() at do_sys_openat+0x68 > sys_open() at sys_open+0x24 > syscall() at syscall+0x9c > --- syscall (number 5) --- > 7f7ff7c0c20a: > > > crash> bt/t 0t23567 > trace: pid 23567 lid 1 at 0xfffffe8040734c60 > sleepq_block() at sleepq_block+0xa2 > turnstile_block() at turnstile_block+0x40e > rw_vector_enter() at rw_vector_enter+0x2d0 > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x54 > vn_lock() at vn_lock+0x82 > getcwd_common() at getcwd_common+0x2cd > sys___getcwd() at sys___getcwd+0xae > syscall() at syscall+0x9c > --- syscall (number 296) --- > 7f7ff6c9f6ba: > > > crash> bt/t 0t21797 > trace: pid 21797 lid 1 at 0xfffffe804073cc60 > sleepq_block() at sleepq_block+0xa2 > turnstile_block() at turnstile_block+0x40e > rw_vector_enter() at rw_vector_enter+0x2d0 > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x54 > vn_lock() at vn_lock+0x82 > getcwd_common() at getcwd_common+0x2cd > sys___getcwd() at sys___getcwd+0xae > syscall() at syscall+0x9c > --- syscall (number 296) --- > 7f7ff6c9f6ba: > > > crash> bt/t 0t17339 > trace: pid 17339 lid 1 at 0xfffffe80407a4c60 > sleepq_block() at sleepq_block+0xa2 > turnstile_block() at turnstile_block+0x40e > rw_vector_enter() at rw_vector_enter+0x2d0 > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x54 > vn_lock() at vn_lock+0x82 > getcwd_common() at getcwd_common+0x2cd > sys___getcwd() at sys___getcwd+0xae > syscall() at syscall+0x9c > --- syscall (number 296) --- > 7f7ff6c9f6ba: > > > crash> bt/t 0t13334 > trace: pid 13334 lid 1 at 0xfffffe80406b0c60 > sleepq_block() at sleepq_block+0xa2 > turnstile_block() at turnstile_block+0x40e > rw_vector_enter() at rw_vector_enter+0x2d0 > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x54 > vn_lock() at vn_lock+0x82 > getcwd_common() at getcwd_common+0x2cd > sys___getcwd() at sys___getcwd+0xae > syscall() at syscall+0x9c > --- syscall (number 296) --- > 7f7ff6c9f6ba: > > > crash> bt/t 0t14050 > trace: pid 14050 lid 1 at 0xfffffe8040778c60 > sleepq_block() at sleepq_block+0xa2 > turnstile_block() at turnstile_block+0x40e > rw_vector_enter() at rw_vector_enter+0x2d0 > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x54 > vn_lock() at vn_lock+0x82 > getcwd_common() at getcwd_common+0x2cd > sys___getcwd() at sys___getcwd+0xae > syscall() at syscall+0x9c > --- syscall (number 296) --- > 7f7ff6c9f6ba: > > > crash> bt/t 0t8085 > trace: pid 8085 lid 1 at 0xfffffe80406ecc60 > sleepq_block() at sleepq_block+0xa2 > turnstile_block() at turnstile_block+0x40e > rw_vector_enter() at rw_vector_enter+0x2d0 > genfs_lock() at genfs_lock+0x7b > VOP_LOCK() at VOP_LOCK+0x54 > vn_lock() at vn_lock+0x82 > getcwd_common() at getcwd_common+0x2cd > sys___getcwd() at sys___getcwd+0xae > syscall() at syscall+0x9c > --- syscall (number 296) --- > 7f7ff6c9f6ba: > crash> ^Dvargaz:~/crash$ exit > > Script done on Wed Oct 14 23:48:44 2015 > > Note the complicated mount points, which might make any bugs in locking > more likely to pop up: the usual null mounts from pkg_comp, but with an > additional mount of a local directory for actually building in (so that > that doesn't need to go over NFS). > > /dev/wd0a on / type ffs (local) > /dev/wd0f on /var type ffs (log, local) > /dev/wd0e on /usr type ffs (log, local) > /dev/wd0g on /home type ffs (log, local) > /dev/wd0h on /tmp type ffs (log, local) > kernfs on /kern type kernfs (local) > ptyfs on /dev/pts type ptyfs (local) > procfs on /proc type procfs (local) > procfs on /usr/pkg/emul/linux32/proc type procfs (read-only, local) > nfsserver:/mnt/vol1 on /mnt/vol1 type nfs > nfsserver:/mnt/scratch on /mnt/scratch type nfs > tmpfs on /var/shm type tmpfs (local) > /mnt/vol1/rhialto/cvs/src on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/usr/src type null (read-only) > /mnt/vol1/rhialto/cvs/pkgsrc on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/usr/pkgsrc type null (read-only) > /mnt/vol1/distfiles on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/distfiles type null > /mnt/scratch/scratch/packages.amd64-7.0 on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/packages type null > /home/rhialto/obj on /mnt/scratch/scratch/chroot/pkg_comp.amd64-7.0/default/pkg_comp/obj/pkgsrc type null (local) > procfs on /usr/pkg/emul/linux32/proc type procfs (local) Examining the crash dump further I get: Threads 23567, 21797, 17339, 13334, 14050 and 8085 want vnode 0xfffffe81318ae748. Thread 7060 holds vnode 0xfffffe81318ae748 and wants vnode 0xfffffe811c0fce50. Thread 27257 holds vnode 0xfffffe811c0fce50 and wants vnode 0xfffffe811c0fc760. Vnode 0xfffffe811c0fc760 is (v_size 55295, v_flag VV_MAPPED | VI_EXECMAP, v_type = VREG, v_tag = VT_NFS on /mnt/scratch). Thread 23883 holds vnode 0xfffffe811c0fc760, its trace is 0 mi_switch (l=l@entry=0xfffffe810907ab60) 1 sleepq_block (timo=timo@entry=500, catch_p=catch_p@entry=false) 2 cv_timedwait (cv=cv@entry=0xfffffe8135dfbcf8, mtx=mtx@entry=0xfffffe813fdaaf40, timo=500) 3 sbwait (sb=sb@entry=0xfffffe8135dfbcb0) 4 soreceive (so=0xfffffe8135dfbb68, paddr=0xfffffe8040710b60, uio=0xfffffe8040710b98, mp0=<optimized out>, controlp=0x0, flagsp=0xfffffe8040710b2c) 5 nfs_receive (l=0xfffffe810907ab60, mp=0xfffffe8040710b58, aname=0xfffffe8040710b60, rep=0xfffffe813eef1cf0) 6 nfs_reply (lwp=0xfffffe810907ab60, myrep=0xfffffe813eef1cf0) 7 nfs_request (np=np@entry=0xfffffe812bc3a6a0, mrest=mrest@entry=0xfffffe811a75e000, procnum=procnum@entry=1, lwp=0xfffffe810907ab60, cred=0xfffffe8124face40, mrp=mrp@entry=0xfffffe8040710c50, mdp=mdp@entry=0xfffffe8040710c58, dposp=dposp@entry=0xfffffe8040710c48, rexmitp=rexmitp@entry=0x0) 8 nfs_getattr (v=0xfffffe8040710cb0) 9 VOP_GETATTR (vp=vp@entry=0xfffffe811c0fc760, vap=vap@entry=0xfffffe8040710ce8, cred=<optimized out>) 10 vn_stat (vp=vp@entry=0xfffffe811c0fc760, sb=sb@entry=0xfffffe8040710e00) 11 vn_statfile (fp=<optimized out>, sb=0xfffffe8040710e00) 12 do_sys_fstat (fd=4, sb=sb@entry=0xfffffe8040710e00) 13 sys___fstat50 (l=<optimized out>, uap=0xfffffe8040710f00, retval=<optimized out>) 14 sy_call (rval=0xfffffe8040710eb8, uap=0xfffffe8040710f00, l=0xfffffe810907ab60, sy=0xffffffff81108c60 <sysent+10560>) 15 sy_invoke (code=440, rval=0xfffffe8040710eb8, uap=0xfffffe8040710f00, l=0xfffffe810907ab60, sy=0xffffffff81108c60 <sysent+10560>) 16 syscall (frame=0xfffffe8040710f00) 17 Xsyscall () Looks like we are waiting for a NFS operation to complete. Did the machine hang here? -- J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail