NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/50313: processes get stuck doing exec



>Number:         50313
>Category:       kern
>Synopsis:       processes get stuck doing exec
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 08 10:05:00 +0000 2015
>Originator:     Havard Eidnes
>Release:        NetBSD 6.1_STABLE
>Organization:
	UNINETT AS
>Environment:
System: NetBSD smistad.uninett.no 6.1_STABLE NetBSD 6.1_STABLE (MAANEN) #1: Wed Oct 29 11:27:25 CET 2014 he%smistad.uninett.no@localhost:/usr/obj/sys/arch/i386/compile/MAANEN i386
Architecture: i386
Machine: i386
>Description:
	All too often processes get stuck on this machine.  According
	to ps the processes are stuck in "R" state.  Running crash on
	the live system shows this call graph

syscall -> sys_execve -> execve1 -> execve_loadvm -> pool_get ->
pool_grow -> exec_pool_alloc -> uvm_km_alloc -> uvm_map ->
uvm_map_prepare -> cv_timedwait -> sleepq_block

	as shown in this session with "crash":

smistad# crash
Crash version 6.1_STABLE, image version 6.1_STABLE.
Output from a running system is unreliable.
crash>
crash> t/t 0t2052
trace: pid 2052 lid 1 at 0xddb69938
sleepq_block(64,0,c0c54ee0,c0cda590,0,c4677c00,c4676680,c0d1dc00,c0d1dc10,c0d1dc14) at sleepq_block+0xa3
cv_timedwait(c0d1dc14,c0d1dc10,64,ddb699bc,c0d1da20,ffffffff,ffffffff,0,801727,0) at cv_timedwait+0x126
uvm_map_prepare(c0d1dc00,c0000000,40000,c0d1da20,ffffffff,ffffffff,0,801727,ddb699e8,0) at uvm_map_prepare+0x160
uvm_map(c0d1dc00,ddb69a58,40000,c0d1da20,ffffffff,ffffffff,0,801727,800002,c0d0aa94) at uvm_map+0x7f
uvm_km_alloc(c0d1dc00,40000,0,800002,ddb69ac0,c07daf4a,c0d0aa20,1,0,0) at uvm_km_alloc+0xe6
exec_pool_alloc(c0d0aa20,1,0,0,0,c055a02e,0,0,0,c0cda6e0) at exec_pool_alloc+0x2b
pool_grow(c0d0aa94,0,c054,c76e9180,ce7b5c09,4,c0d0aa98,c0d0aa94,1,0) at pool_grow+0x2a
pool_get(c0d0aa20,1,c8d99948,cea7cf80,c08d0731,0,ddb69b80,ddb69bec,ddb69be8,c7731c14) at pool_get+0x79
execve_loadvm(bb9123cc,c054f830,ddb69b68,bb912404,c4eedc00,c5327c00,cf1d7008,404,404,9) at execve_loadvm+0x1b0
execve1(cf0dfd40,bb912404,bb9123bc,bb9123cc,c054f830,ddb69d3c,c080312a,cf0dfd40,ddb69d00,ddb69d28) at execve1+0x32
sys_execve(cf0dfd40,ddb69d00,ddb69d28,c0803389,c7834cc0,3b,bb90e000,bb90e020,ddb69d00,c7731c14) at sys_execve+0x30
syscall(ddb69d48,bbb900b3,ab,bfbf001f,bbb9001f,bb912404,bb912404,bfbfeb20,bb9123bc,0) at syscall+0xaa
crash> t/t 0t498
trace: pid 498 lid 1 at 0xdd1e5938
sleepq_block(64,0,c0c54ee0,c0cda590,0,c4677d40,c4676680,c0d1dc00,c0d1dc10,c0d1dc14) at sleepq_block+0xa3
cv_timedwait(c0d1dc14,c0d1dc10,64,dd1e59bc,c0d1da20,ffffffff,ffffffff,0,801727,0) at cv_timedwait+0x126
uvm_map_prepare(c0d1dc00,c0000000,40000,c0d1da20,ffffffff,ffffffff,0,801727,dd1e59e8,0) at uvm_map_prepare+0x160
uvm_map(c0d1dc00,dd1e5a58,40000,c0d1da20,ffffffff,ffffffff,0,801727,800002,c0d0aa94) at uvm_map+0x7f
uvm_km_alloc(c0d1dc00,40000,0,800002,dd1e5ac0,c07daf4a,c0d0aa20,1,0,0) at uvm_km_alloc+0xe6
exec_pool_alloc(c0d0aa20,1,0,0,0,c055a02e,0,0,0,c0cda6e0) at exec_pool_alloc+0x2b
pool_grow(c0d0aa94,0,c054,c76e9b40,ce7b5009,5,c0d0aa98,c0d0aa94,1,0) at pool_grow+0x2a
pool_get(c0d0aa20,1,ce9a3d28,cac41240,c08d0ae3,0,dd1e5b80,dd1e5bec,dd1e5be8,c4c721d4) at pool_get+0x79
execve_loadvm(bb907c04,c054f830,dd1e5b68,bb907ae4,c4ef7c00,c4ef7000,cf87a008,404,404,9) at execve_loadvm+0x1b0
execve1(cf8d9aa0,bb907ae4,bb907b84,bb907c04,c054f830,dd1e5d3c,c080312a,cf8d9aa0,dd1e5d00,dd1e5d28) at execve1+0x32
sys_execve(cf8d9aa0,dd1e5d00,dd1e5d28,c0803389,c52ed364,3b,bba9b000,bba9b320,dd1e5d00,c4c721d4) at sys_execve+0x30
syscall(dd1e5d48,bbb900b3,ab,bfbf001f,bbb9001f,bb907ae4,bb907ae4,bfbfd6d8,bb907b84,7d7b7cff) at syscall+0xaa
crash> t/t 0t22404
trace: pid 22404 lid 1 at 0xdd0ee938
sleepq_block(64,0,c0c54ee0,c0cda590,0,c4677d40,c4676680,c0d1dc00,c0d1dc10,c0d1dc14) at sleepq_block+0xa3
cv_timedwait(c0d1dc14,c0d1dc10,64,dd0ee9bc,c0d1da20,ffffffff,ffffffff,0,801727,0) at cv_timedwait+0x126
uvm_map_prepare(c0d1dc00,c0000000,40000,c0d1da20,ffffffff,ffffffff,0,801727,dd0ee9e8,0) at uvm_map_prepare+0x160
uvm_map(c0d1dc00,dd0eea58,40000,c0d1da20,ffffffff,ffffffff,0,801727,800002,c0d0aa94) at uvm_map+0x7f
uvm_km_alloc(c0d1dc00,40000,0,800002,dd0eeac0,c07daf4a,c0d0aa20,1,0,0) at uvm_km_alloc+0xe6
exec_pool_alloc(c0d0aa20,1,0,0,0,c055a02e,0,0,0,c0cda6e0) at exec_pool_alloc+0x2b
pool_grow(c0d0aa94,0,c054,c76e9b40,c8511c05,3,c0d0aa98,c0d0aa94,1,0) at pool_grow+0x2a
pool_get(c0d0aa20,1,ce9a3c08,c8091e80,c08d0ae3,0,dd0eeb80,dd0eebec,dd0eebe8,cf7fac30) at pool_get+0x79
execve_loadvm(bb907b04,c054f830,dd0eeb68,bb907ae4,c4ebec00,c5327400,cf8c7808,404,404,9) at execve_loadvm+0x1b0
execve1(cf8d9800,bb907ae4,bb907afc,bb907b04,c054f830,dd0eed3c,c080312a,cf8d9800,dd0eed00,dd0eed28) at execve1+0x32
sys_execve(cf8d9800,dd0eed00,dd0eed28,c0803389,cf88ba44,3b,bba9b000,bba9b320,dd0eed00,cf7fac30) at sys_execve+0x30
syscall(dd0eed48,bbb900b3,ab,bfbf001f,bbb9001f,bb907ae4,bb907ae4,bfbfd6d8,bb907afc,6d6d6bff) at syscall+0xaa
crash> 

	execve_loadvm() contains only one pool_get, this one:

        /* allocate an argument buffer */
	        data->ed_argp = pool_get(&exec_pool, PR_WAITOK);

	and "vmstat -m" says:

Memory resource pool statistics
Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
...
execargs    262144  40498    0    40498     9     7     2     4     0    16    2
...

	I had at that time started a pkgsrc compile, and one of
	the stuck processes was a child inside that tree, but
	aborting the build changed very little in the above
	stats:

execargs    262144  40559    0    40559     9     7     2     4     0    16    2

	At this point experience tells me that I need to reboot
	to regain full use of the system.  I wish I didn't have
	to.

	exec_pool_alloc in its turn runs uvm_km_alloc(kernel_map, ...)
	but I don't know if I have a way to inspect the state of
	kernel_map, i.e. if there's any free space left (probably not?)

>How-To-Repeat:
	Sorry, I don't have an exact recipe for reproducing the problem.

>Fix:
	I wish I knew.



Home | Main Index | Thread Index | Old Index