NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/50375: layerfs (nullfs) locking problem leading to livelock

>Number:         50375
>Category:       kern
>Synopsis:       layerfs (nullfs) locking problem leading to livelock
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Oct 28 15:45:00 +0000 2015
>Originator:     Jeff Rizzo
>Release:        7.99.21/evbarm
NetBSD jetson1.lan 7.99.21 NetBSD 7.99.21 (JETSONTK1) #9: Thu Oct 15 14:36:15 PDT 2015 evbarm
Doing pbulk builds with nullfs mounts in chroots on a 4-core ARM (tegra tk1) system, I very frequently see a problem where it stops making progress, and a bunch of processes get stuck in 'tstile'.  This time I happened to notice that process 28575 was the first to enter tstile. (see below)

When it does this, I can use crash and ddb to get info, and gdb against /dev/mem seems to work somewhat (not "info threads", though), but I have not been able to get a crash dump.

My interpretation of the debugging I got below is that the "culprit" process is PID 14346, which was: 
polkit  14346  0.0  0.1   4936  2552 ?     D     7:15AM  0:00.50 /usr/bi 1001 14346 26273 34233 125  0   4936  2552 vnode   D    ?      0:00.50 /usr/bin/make _MAKE OPSYS OS_VERSION LOWER_OPSYS _PKGSRCDIR PKGTOOLS_VERSION _CC _PATH_ORIG _PKGSRC_BARRIER ALLOW_VULNERABLE_PACKAGES all

My understanding is that the next step would be to look at the individual frames of the backtrace of that process to figure out what vp is - I would appreciate suggestions for how to do this with the system live, using either ddb or gdb against /dev/mem.  (Assume I don't know what I'm doing, and give me very specific instructions :)

crash> ps/l |grep tstile
29934    1 3   3         0           96f83460                 sh tstile
23822    1 3   1         0           9357ce20                 sh tstile
28524    1 3   0         0           93dea080                 sh tstile
21780    1 3   3         0           93983120                 sh tstile
28575    1 3   3         0           96f831a0          python3.4 tstile
2319     1 3   0         0           92fec960        gvfsd-trash tstile
0       67 3   2       200           91c733e0            ioflush tstile
0        9 3   0       200           91596840             vdrain tstile
crash> bt/a 96f831a0
trace: pid 28575 lid 1 at 0xa1f57aa4
0xa1f57aa4: mi_switch+0x10
0xa1f57ad4: sleepq_block+0xb4
0xa1f57b14: turnstile_block+0x318
0xa1f57b8c: rw_vector_enter+0x3c0
0xa1f57bbc: genfs_lock+0x68
0xa1f57be4: VOP_LOCK+0x40
0xa1f57c0c: layer_lock+0x44
0xa1f57c34: VOP_LOCK+0x40
0xa1f57c5c: vn_lock+0x88
0xa1f57cac: lookup_once+0x224
0xa1f57d7c: namei_tryemulroot+0x528
0xa1f57db4: namei+0ameiat.isra.0+0x64
0xa1f57e4c: do_sys_statat+0x84
0xa1f57f04: sys___stat50+0x2c
0xa1f57f7c: syscall+0xb8
0xa1f57fac: swi_handler+0xa0

crash> ps/w |grep tstile
29934    1               sh   netbsd   27 tstile       922b78e4
23822    1               sh   netbsd   27 tstile       922b78e4
28524    1               sh   netbsd   27 tstile       935fb98c
21780    1               sh   netbsd   27 tstile       951f781c
28575    1        python3.4   netbsd   27 tstile       92b1d834
2319     1      gvfsd-trash   netbsd   43 tstile       922b78e4
0       67           system   netbsd  124 tstile       951f781c
0        9           system   netbsd  125 tstile       951f781c

db{3}> show lock 92b1d834
lock address : 0x0000000092b1d834 type     :     sleep/adaptive
initialized  : 0x000000008136442c
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  3 last held:                  2
current lwp  : 0x00000000915c10c0 last held: 0x0000000093450300
last locked* : 0x00000000813795f8 unlocked : 0x0000000081379714
owner/count  : 0x0000000093450300 flags    : 0x0000000000000007

Turnstile chain at 0x81609eb0.
=> Turnstile at 0x9706bd90 (wrq=0x9706bda0, rdq=0x9706bda8).
=> 0 waiting readers:
=> 1 waiting writers: 0x96f831a0

db{3}> bt/a 0x0000000093450300
trace: pid 14346 lid 1 at 0x9d6218c4
0x9d6218c4: netbsd:mi_switch+0x10
0x9d6218f4: netbsd:sleepq_block+0xb4
0x9d62192c: netbsd:cv_wait+0x130
0x9d621954: netbsd:vwait+0x50
0x9d62197c: netbsd:vget+0xd4
0x9d6219e4: netbsd:vcache_get+0x158
0x9d621a14: netbsd:layer_node_create+0x2c
0x9d621a44: netbsd:layer_lookup+0xfc
0x9d621a7c: netbsd:VOP_LOOKUP+0x48
0x9d621bdc: netbsd:getcwd_common+0x258
0x9d621bfc: netbsd:vn_isunder+0x2c
0x9d621c4c: netbsd:lookup_once+0xfc
0x9d621d1c: netbsd:namei_tryemulroot+0x528
0x9d621d54: netbsd:namei+0x34
0x9d621e2c: netbsd:vn_open+0x94
0x9d621eac: netbsd:do_open+0xb0
0x9d621edc: netbsd:do_sys_openat+0x7c
0x9d621f04: netbsd:sys_open+0x38
0x9d621f7c: netbsd:syscall+0xb8
0x9d621fac: netbsd:swi_handler+0xa0
db{3}> show lock 922b78e4
lock address : 0x00000000922b78e4 type     :     sleep/adaptive
initialized  : 0x000000008136442c
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  3
current cpu  :                  3 last held:                  0
current lwp  : 0x00000000915c10c0 last held: 0x0000000093dea080
last locked* : 0x00000000813795f8 unlocked : 0x0000000081379714
owner/count  : 0x0000000093dea080 flags    : 0x0000000000000007

Turnstile chain at 0x81609f60.
=> Turnstile at 0x9706b6c8 (wrq=0x9706b6d8, rdq=0x9706b6e0).
=> 0 waiting readers:
=> 3 waiting writers: 0x92fec960 0x9357ce20 0x96f83460
db{3}> show lock 935fb98c
lock address : 0x00000000935fb98c type     :     sleep/adaptive
initialized  : 0x000000008136442c
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  3 last held:                  3
current lwc10c0 last held: 0x0000000093983120
last locked* : 0x00000000813795f8 unlocked : 0x0000000081379714
owner/count  : 0x0000000093983120 flags    : 0x0000000000000007

Turnstile chain at 0x8160a008.
=> Turnstile at 0x9706afc8 (wrq=0x9706afd8, rdq=0x9706afe0).
=> 0 waiting readers:
=> 1 waiting writers: 0x93dea080
db{3}> show lock 951f781c
lock address : 0x00000000951f781c type     :     sleep/adaptive
initialized  : 0x000000008136442c
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  3
current cpu  :                  3 last held:                  3
current lwp  : 0x00000000915c10c0 last held: 0x0000000096f831a0
last locked* : 0x00000000813795f8 unlocked : 0x0000000081379714
owner/count  : 0x0000000096f831a0 flags    : 0x0000000000000007

Turnstile chain at 0x81609e98.
=> Turnstile at 0x9706af90 (wrq=0x9706afa0, rdq=0x9706afa8).
=> 0 waiting readers:
=> 3 waiting writers: 0x91596840 0x91c733e0 0x93983120
db{3}> bt/a 0x0000000093dea080
trace: pid 28524 lid 1 at 0xa4aa7aa4
0xa4aa7aa4: netbsd:mi_switch+0x10
0xa4aa7ad4: netbsd:sleepq_block+0xb4
0xa4aa7b14: netbsd:turnstile_block+0x318
0xa4aa7b8c: netbsd:rw_enter+0x3c0
0xa4aa7bbc: netbsd:genfs_lock+0x68
0xa4aa7be4: netbsd:VOP_LOCK+0x40
0xa4aa7c0c: netbsd:layer_lock+0x44
0xa4aa7c34: netbsd:VOP_LOCK+0x40
0xa4aa7c5c: netbsd:vn_lock+0x88
0xa4aa7cac: netbsd:lookup_once+0x224
0xa4aa7d7c: netbsd:namei_tryemulroot+0x528
0xa4aa7db4: netbsd:namei+0x34
0xa4aa7ddc: netbsd:fd_nameiat.isra.0+0x64
0xa4aa7e4c: netbsd:do_sys_statat+0x84
0xa4aa7f04: netbsd:sys___stat50+0x2c
0xa4aa7f7c: netbsd:syscall+0xb8
0xa4aa7fac: netbsd:swi_handler+0xa0
db{3}> bt/a 0x0000000093983120
trace: pid 21780 lid 1 at 0x9ec71aa4
0x9ec71aa4: netbsd:mi_switch+0x10
0x9ec71ad4: netbsd:sleepq_block+0xb4
0x9ec71b14: netbsd:turnstile_block+0x318
0x9ec71b8c: netbsd:rw_enter+0x3c0
0x9ec71bbc: netbsd:genfs_lock+0x68
0x9ec71be4: netbsd:VOP_LOCK+0x40
0x9ec71c0c: netbsd:layer_lock+0x44
0x9ec71c5c: netbsd:vn_lock+0x88
0x9ec71cac: netbsd:lookup_once+0x224
0x9ec71d7c: netbsd:namei_tryemulroot+0x528
0x9ec71db4: netbsd:namei+0x34
0x9ec71ddc: netbsd:fd_nameiat.isra.0+0x64
0x9ec71e4c: netbsd:do_sys_statat+0x84
0x9ec71f04: netbsd:sys___stat50+0x2c
0x9ec71f7c: netbsd:syscall+0xb8
0x9ec71fac: netbsd:swi_handler+0xa0
db{3}> bt/a 0x0000000096f831a0
trace: pid 28575 lid 1 at 0xa1f57aa4
0xa1f57aa4: netbsd:mi_switch+0x10
0xa1f57ad4: netbsd:sleepq_block+0xb4
0xa1f57b14: netbsd:turnstile_block+0x318
0xa1f57b8c: netbsd:rw_enter+0x3c0
0xa1f57bbc: netbsd:genfs_lock+0x68
0xa1f57be4: netbsd:VOP_LOCK+0x40
0xa1f57c0c: netbsd:layer_lock+0x44
0xa1f57c34: netbsd:VOP_LOCK+0x40
0xa1f57c5c: netbsd:vn_lock+0x88
0xa1f57cac: netbsd:lookup_once+0x224
0xa1f57d7c: netbsd:namei_tryemulroot+0x528
0xa1f57db4: netbsd:namei+0x34
0xa1f57ddc: netbsd:fd_nameiat.isra.0+0x64
0xa1f57e4c: netbsd:do_sys_statat+0x84
0xa1f57f04: netbsd:sys___stat50+0x2c
0xa1f57f7c: netbsd:syscall+0xb8
0xa1f57fac: netbsd:swi_handler+0xa0
db{3}> bt/a 96f83460
trace: pid 29934 lid 1 at 0x9e277a0c
0x9e277a0c: netbsd:mi_switch+0x10
0x9e277a3c: netbsd:sleepq_block+0xb4
0x9e277a7c: netbsd:turnstile_block+0x318
0x9e277af4: netbsd:rw_enter+0x3c0
0x9e277b24: netbsd:genfs_lock+0x68
0x9e277b4c: netbsd:VOP_LOCK+0x40
0x9e277b74: netbsd:layer_lock+0x44
0x9e277b9c: netbsd:VOP_LOCK+0x68
0x9e277bc4: netbsd:vn_lock+0x88
0x9e277bdc: netbsd:layerfs_root+0x38
0x9e277bfc: netbsd:VFS_ROOT+0x30
0x9e277c4c: netbsd:lookup_once+0x29c
0x9e277d1c: netbsd:namei_tryemulroot+0x528
0x9e277d54: netbsd:namei+0x34
0x9e277e2c: netbsd:vn_open+0x94
0x9e277eac: netbsd:do_open+0xb0
0x9e277edc: netbsd:do_sys_openat+0x7c
0x9e277f04: netbsd:sys_open+0x38
0x9e277f7c: netbsd:syscall+0xb8
0x9e277fac: netbsd:swi_handler+0xa0
db{3}> bt/a 9357ce20
trace: pid 23822 lid 1 at 0x9ce49a6c
0x9ce49a6c: netbsd:mi_switch+0x10
0x9ce49a9c: netbsd:sleepq_block+0xb4
0x9ce49adc: netbsd:turnstile_block+0x318
0x9ce49b54: netbsd:rw_enter+0x3c0
0x9ce49b84: netbsd:genfs_lock+0x68
0x9ce49bd4: netbsd:layer_lock+0x44
0x9ce49bfc: netbsd:VOP_LOCK+0x68
0x9ce49c24: netbsd:vn_lock+0x88
0x9ce49c3c: netbsd:layerfs_root+0x38
0x9ce49c5c: netbsd:VFS_ROOT+0x30
0x9ce49cac: netbsd:lookup_once+0x29c
0x9ce49d7c: netbsd:namei_tryemulroot+0x528
0x9ce49db4: netbsd:namei+0x34
0x9ce49ddc: netbsd:fd_nameiat.isra.0+0x64
0x9ce49e4c: netbsd:do_sys_statat+0x84
0x9ce49f04: netbsd:sys___stat50+0x2c
0x9ce49f7c: netbsd:syscall+0xb8
0x9ce49fac: netbsd:swi_handler+0xa0
db{3}> bt/a 92fec960
trace: pid 2319 lid 1 at 0x9d483a0c
0x9d483a0c: netbsd:mi_switch+0x10
0x9d483a3c: netbsd:sleepq_block+0xb4
0x9d483a7c: netbsd:turnstile_block+0x318
0x9d483af4: netbsd:rw_enter+0x3c0
0x9d483b24: netbsd:genfs_lock+0x68
0x9d483b4c: netbsd:VOP_LOCK+0x40
0x9d483b74: netbsd:layer_lock+0x44
0x9d483b9c: netbsd:VOP_LOCK+0x68
0x9d483bc4: netbsd:vn_lock+0x88
0x9d483bdc: netbsd:layerfs_root+0x38
0x9d483bfc: netbsd:VFS_ROOT+0x30
0x9d483c4c: netbsd:lookup_once+0x29c
0x9d483d1c: netbsd:namei_tryemulroot+0x528
0x9d483d54: netbsd:namei+0x34
0x9d483e2c: netbsd:vn_open+0x94
0x9d483eac: netbsd:do_open+0xb0
0x9d483edc: netbsd:do_sys_openat+0x7c
0x9d483f04: netbsd:sys_open+0x38
0x9d483f7c: netbsd:syscall+0xb8
0x9d483fac: netbsd:swi_handler+0xa0
db{3}> bt/a 91c733e0
trace: pid 0 lid 67 at 0x9aaa9d64
0x9aaa9d64: netbsd:mi_switch+0x10
0x9aaa9d94: netbsd:sleepq_block+0xb4
0x9aaa9dd4: netbsd:turnstile_block+0x318
0x9aaa9e4c: netbsd:rw_enter+0x3c0
0x9aaa9e7c: netbsd:genfs_lock+0x68
0x9aaa9ea4: netbsd:VOP_LOCK+0x40
0x9aaa9ecc: netbsd:vn_lock+0x88
0x9aaa9f2c: netbsd:ffs_sync+0xb0
0x9aaa9f4c: netbsd:VFS_SYNC+0x30
0x9aaa9fac: netbsd:sched_sync+0x27c
db{3}> bt/a 91596840
trace: pid 0 lid 9 at 0x9a825d74
0x9a825d74: netbsd:mi_switch+0x10
0x9a825da4: netbsd:sleepq_block+0xb4
0x9a825de4: netbsd:turnstile_block+0x318
0x9a825e5c: netbsd:rw_enter+0x3c0
0x9a825e8c: netbsd:genfs_lock+0x68
0x9a825eb4: netbsd:VOP_LOCK+0x40
0x9a825edc: netbsd:layer_lock+0x44
0x9a825f04: netbsd:VOP_LOCK+0x40
0x9a825f2c: netbsd:vn_lock+0x88
0x9a825f5c: netbsd:vclean+d:cleanvnode+0xf4
0x9a825fac: netbsd:vdrain_thread+0x68
Build pbulk packages on top of layerfs

Home | Main Index | Thread Index | Old Index