Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Unkillable process got stuck in uao_put()



Hello folks,
on NetBSD/amd64 -current (2019-04-27 11:40 UTC) when trying to grep
source code via ag (part of textproc/the_silver_searcher) on a
tmpfs the ag process get stuck in uao_put() and become unkillable.

I can easily reproduce^[0] that using libvirt-5.2.0 distfile^[1],
extracting them on a tmpfs (via `bsdtar xJf libvirt-5.2.0') and
then, in libvirt-5.2.0 doing a:

 | % ag std

Via ^T I can see:

 | [ 297.2992258] load: 0.06  cmd: ag 558 [tstile tstile tstile tstile tstile tstile uao_put parked] 1.20u 0.64s 0% 4144k  

The crash(8) output of the stuck process is:

 | crash> ps
 | PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 | [...]
 | 558      8 3   0         0   fffff4f9f9fea900                 ag tstile
 | 558      7 3   1         0   fffff4f9f9fea4c0                 ag tstile
 | 558      6 3   7         0   fffff4f9f9fea080                 ag tstile
 | 558      5 3   5         0   fffff4f9fa04a8e0                 ag tstile
 | 558      4 3   1         0   fffff4f9fa04a4a0                 ag tstile
 | 558      3 3   4         0   fffff4f9fa04a060                 ag tstile
 | 558      2 3   2         0   fffff4fa0f30f8c0                 ag uao_put
 | 558      1 3   3        80   fffff4fa0f01b300                 ag parked
 | [...]

...and their traces:

 | # echo ps | crash | awk '$1 == 558 { print "bt/a " $6 }' | crash
 | Crash version 8.99.37, image version 8.99.37.
 | Output from a running system is unreliable.
 | trace: pid 558 lid 8 at 0xffffd381cbc72c40
 | sleepq_block() at sleepq_block+0xb8
 | turnstile_block() at turnstile_block+0x4f8
 | rw_vector_enter() at rw_vector_enter+0x20f
 | uvm_fault_internal() at uvm_fault_internal+0x16c5
 | trap() at trap+0x343
 | --- trap (number 6) ---
 | 409248:
 | trace: pid 558 lid 7 at 0xffffd381cbc6dc40
 | sleepq_block() at sleepq_block+0xb8
 | turnstile_block() at turnstile_block+0x4f8
 | rw_vector_enter() at rw_vector_enter+0x20f
 | uvm_fault_internal() at uvm_fault_internal+0x153
 | trap() at trap+0x343
 | --- trap (number 6) ---
 | 409248:
 | trace: pid 558 lid 6 at 0xffffd381cbc68c40
 | sleepq_block() at sleepq_block+0xb8
 | turnstile_block() at turnstile_block+0x4f8
 | rw_vector_enter() at rw_vector_enter+0x20f
 | uvm_fault_internal() at uvm_fault_internal+0x16c5
 | trap() at trap+0x343
 | --- trap (number 6) ---
 | 408e94:
 | trace: pid 558 lid 5 at 0xffffd381cbc63c40
 | sleepq_block() at sleepq_block+0xb8
 | turnstile_block() at turnstile_block+0x4f8
 | rw_vector_enter() at rw_vector_enter+0x20f
 | uvm_fault_internal() at uvm_fault_internal+0x153
 | trap() at trap+0x343
 | --- trap (number 6) ---
 | 408e94:
 | trace: pid 558 lid 4 at 0xffffd381cbc3de20
 | sleepq_block() at sleepq_block+0xb8
 | turnstile_block() at turnstile_block+0x4f8
 | rw_vector_enter() at rw_vector_enter+0x20f
 | vm_map_lock() at vm_map_lock+0x66
 | sys_munmap() at sys_munmap+0x58
 | syscall() at syscall+0x188
 | --- syscall (number 73) ---
 | 777180d9a1ba:
 | trace: pid 558 lid 3 at 0xffffd381cbc38c40
 | sleepq_block() at sleepq_block+0xb8
 | turnstile_block() at turnstile_block+0x4f8
 | rw_vector_enter() at rw_vector_enter+0x20f
 | uvm_fault_internal() at uvm_fault_internal+0x16c5
 | trap() at trap+0x343
 | --- trap (number 6) ---
 | 408e94:
 | trace: pid 558 lid 2 at 0xffffd381cbc33b20
 | sleepq_block() at sleepq_block+0xb8
 | mtsleep() at mtsleep+0x149
 | uao_put() at uao_put+0x268
 | VOP_PUTPAGES() at VOP_PUTPAGES+0x53
 | uvm_fault_internal() at uvm_fault_internal+0x104a
 | trap() at trap+0x343
 | --- trap (number 6) ---
 | 408e94:
 | trace: pid 558 lid 1 at 0xffffd381cb49eed0
 | sleepq_block() at sleepq_block+0xb8
 | lwp_park() at lwp_park+0x117
 | sys____lwp_park60() at sys____lwp_park60+0x5a
 | syscall() at syscall+0x188
 | --- syscall (number 478) ---
 | 777180cb3d7a:

Please let me know how to help to debug this problem further and
I will try to collect any possible information needed (unfortunately
I have not found any simpler way to reproduce that at the moment).


Thank you very much!


[0]: Sometimes the `ag' process exit successfully but retrying that a
     couple of times (5-10 times) leads to get it stucked.
[1]: http://libvirt.org/sources/libvirt-5.2.0.tar.xz


Home | Main Index | Thread Index | Old Index