NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/51474: unbound can get stuck in fd_close()
>Number: 51474
>Category: kern
>Synopsis: unbound can get stuck in fd_close()
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Sep 13 07:15:00 +0000 2016
>Originator: he%uninett.no@localhost
>Release: NetBSD 7.0
>Organization:
UNINETT AS
>Environment:
System: NetBSD osl-res.uninett.no 7.0 NetBSD 7.0 (GENERIC) #0: Wed Sep 30 14:46:48 CEST 2015 he%osl-res.uninett.no@localhost:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
I'm running unbound on one of my machines, and monitor it
using collectd. Collectd uses unbound-control periodically to
collect stats. It looks as if unbound in certain conditions
can become stuck in fd_close(). This has several consequences:
* unbound can then no longer be stopped with "sh
/etc/rc.d/unbound stop" (it just keeps on printing the PID
it's waiting for -- unbound's).
It also doesn't respond to "kill " (which by default sends
TERM).
* Manual use of unbound-control hangs.
Use of "crash" to dig a bit shows at least the kernel part of
its stack:
crash> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
...
1101 12 3 7 1000000 fffffe88714394c0 unbound fdclose
1101 11 3 1 80 fffffe88714398e0 unbound kqueue
1101 10 3 8 80 fffffe88719b3080 unbound kqueue
1101 9 3 5 80 fffffe88719b34a0 unbound kqueue
1101 8 3 15 80 fffffe88719b38c0 unbound kqueue
1101 7 3 10 80 fffffe8872ee6060 unbound kqueue
1101 6 3 9 80 fffffe8872ee6480 unbound kqueue
1101 5 3 14 80 fffffe8872ee68a0 unbound kqueue
1101 4 3 0 80 fffffe88714a9040 unbound kqueue
1101 3 3 7 80 fffffe88714a9460 unbound kqueue
1101 2 3 3 80 fffffe88714a9880 unbound kqueue
1101 1 3 12 80 fffffe8872920020 unbound netio
...
crash> t/t 0t1101
trace: pid 1101 lid 12 at 0xfffffe813ad6dde0
sleepq_block() at sleepq_block+0xa2
cv_wait() at cv_wait+0x9a
fd_close() at fd_close+0x2a2
sys_close() at sys_close+0x1b
syscall() at syscall+0x9a
--- syscall (number 6) ---
7f7ff5e3c62a:
crash>
load averages: 0.16, 0.19, 0.11; up 14+15:39:18 09:03:34
79 threads: 76 sleeping, 1 stopped, 2 on CPU
CPU states: 0.3% user, 0.0% nice, 0.8% system, 0.1% interrupt, 98.6% idle
Memory: 7445M Act, 10M Wired, 23M Exec, 883M File, 22G Free
Swap: 32G Total, 32G Free
PID LID USERNAME PRI STATE TIME WCPU CPU NAME COMMAND
1101 9 unbound 43 CPU/5 441:03 6.93% 6.93% - unbound
1101 6 unbound 85 kqueue/9 123:16 2.25% 2.25% - unbound
1101 2 unbound 85 kqueue/1 140:56 2.20% 2.20% - unbound
1101 7 unbound 85 kqueue/7 100:36 0.29% 0.29% - unbound
1101 11 unbound 85 kqueue/1 114:07 0.24% 0.24% - unbound
1101 8 unbound 85 kqueue/1 108:10 0.24% 0.24% - unbound
1101 10 unbound 85 kqueue/8 88:19 0.20% 0.20% - unbound
1101 1 unbound 85 netio/12 102:12 0.00% 0.00% - unbound
1101 3 unbound 85 kqueue/6 96:36 0.00% 0.00% - unbound
1101 5 unbound 85 kqueue/0 77:38 0.00% 0.00% - unbound
1101 12 unbound 85 fdclos/7 70:59 0.00% 0.00% - unbound
1101 4 unbound 85 kqueue/2 68:00 0.00% 0.00% - unbound
Unbound itself appears to still carry out its main function
(being a DNS name server), but unbound can now no longer be
killed or restarted, forcing a reboot of the host OS to
recover.
Sending "kill -9" leaves two threads still active, clogging
up port 53 so unbound can't be restarted without a kernel reboot:
load averages: 0.06, 0.13, 0.10; up 14+15:43:17 09:07:33
77 threads: 75 sleeping, 1 stopped, 1 on CPU
CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.9% idle
Memory: 7449M Act, 10M Wired, 24M Exec, 882M File, 22G Free
Swap: 32G Total, 32G Free
PID LID USERNAME PRI STATE TIME WCPU CPU NAME COMMAND
1101 6 unbound 85 lwpwai/9 123:19 0.00% 0.00% - unbound
1101 12 unbound 95 fdclos/7 70:59 0.00% 0.00% - unbound
Output from "fstat -p 1101" after this action is:
osl-res# fstat -p 1101
USER CMD PID FD MOUNT INUM MODE SZ|DV R/W
unbound unbound 1101 root / 11993123 drwxr-xr-x 512 r
unbound unbound 1101 wd / 11993123 drwxr-xr-x 512 r
unbound unbound 1101 0 / 2482956 crw-rw-rw- null rw
unbound unbound 1101 1 / 2482956 crw-rw-rw- null rw
unbound unbound 1101 2 / 2482956 crw-rw-rw- null rw
unbound unbound 1101 3* crypto 0xfffffe8876dfcf18
unbound unbound 1101 4* internet dgram udp localhost:domain
unbound unbound 1101 5* internet stream tcp localhost:domain
unbound unbound 1101 6* internet6 dgram udp localhost:domain
unbound unbound 1101 7 flags 0x80034<ISTTY,MPSAFE,LOCKSWORK,CLEAN>
unbound unbound 1101 8 flags 0x80034<ISTTY,MPSAFE,LOCKSWORK,CLEAN>
unbound unbound 1101 9* pipe 0xfffffe88767bcdc0 -> 0x0 w
unbound unbound 1101 10* internet6 stream tcp localhost:domain
unbound unbound 1101 11* internet dgram udp osl-res.uninett.no:domain
unbound unbound 1101 12* internet stream tcp osl-res.uninett.no:domain
unbound unbound 1101 13* internet6 dgram udp [2001:700:0:102::ca53]:domain
unbound unbound 1101 14* internet6 stream tcp [2001:700:0:102::ca53]:domain
unbound unbound 1101 15* internet dgram udp dns-resolver1.uninett.no:domain
unbound unbound 1101 16* internet stream tcp dns-resolver1.uninett.no:domain
unbound unbound 1101 17* internet6 dgram udp dns-resolver1.uninett.no:domain
unbound unbound 1101 18* internet6 stream tcp dns-resolver1.uninett.no:domain
unbound unbound 1101 19* unix stream <-> /var/run/unbound-ctl [creat]
unbound unbound 1101 20* unix dgram <-> /var/run/log [using]
unbound unbound 1101 21* unix stream 0xfffffe83f73a8328 <-> 0xfffffe83f73a82b8
unbound unbound 1101 22* unix stream 0xfffffe83f73a82b8 <-> 0xfffffe83f73a8328
unbound unbound 1101 23* unix stream 0xfffffe83f73a8248 <-> 0xfffffe83f73a81d8
unbound unbound 1101 24* unix stream 0xfffffe83f73a81d8 <-> 0xfffffe83f73a8248
unbound unbound 1101 25* unix stream 0xfffffe83f73a8168 <-> 0xfffffe83f73a80f8
unbound unbound 1101 26* unix stream 0xfffffe83f73a80f8 <-> 0xfffffe83f73a8168
unbound unbound 1101 27* unix stream 0xfffffe83f73a8088 <-> 0xfffffe83f73a8018
unbound unbound 1101 28* unix stream 0xfffffe83f73a8018 <-> 0xfffffe83f73a8088
unbound unbound 1101 29* unix stream 0xfffffe8872f3ef70 <-> 0xfffffe8872f3ef00
unbound unbound 1101 30* unix stream 0xfffffe8872f3ef00 <-> 0xfffffe8872f3ef70
unbound unbound 1101 31* unix stream 0xfffffe8872f3ee90 <-> 0xfffffe8872f3ee20
unbound unbound 1101 32* unix stream 0xfffffe8872f3ee20 <-> 0xfffffe8872f3ee90
unbound unbound 1101 33* unix stream 0xfffffe8872f3edb0 <-> 0xfffffe8872f3ed40
unbound unbound 1101 34* unix stream 0xfffffe8872f3ed40 <-> 0xfffffe8872f3edb0
unbound unbound 1101 35* unix stream 0xfffffe8872f3ecd0 <-> 0xfffffe8872f3ec60
unbound unbound 1101 36* unix stream 0xfffffe8872f3ec60 <-> 0xfffffe8872f3ecd0
unbound unbound 1101 37* unix stream 0xfffffe8872f3ebf0 <-> 0xfffffe8872f3eb80
unbound unbound 1101 38* unix stream 0xfffffe8872f3eb80 <-> 0xfffffe8872f3ebf0
unbound unbound 1101 39* unix stream 0xfffffe8872f3eb10 <-> 0xfffffe8872f3eaa0
unbound unbound 1101 40* unix stream 0xfffffe8872f3eaa0 <-> 0xfffffe8872f3eb10
unbound unbound 1101 41* unix stream 0xfffffe8872f3ea30 <-> 0xfffffe8872f3e9c0
unbound unbound 1101 42* unix stream 0xfffffe8872f3e9c0 <-> 0xfffffe8872f3ea30
unbound unbound 1101 43* unix stream 0xfffffe8872f3e950 <-> 0xfffffe8872f3e8e0
unbound unbound 1101 44* unix stream 0xfffffe8872f3e8e0 <-> 0xfffffe8872f3e950
unbound unbound 1101 45* kqueue pending 8
unbound unbound 1101 46* kqueue pending 7
unbound unbound 1101 47* kqueue pending 13
unbound unbound 1101 48* kqueue pending 8
unbound unbound 1101 49* kqueue pending 7
unbound unbound 1101 50* kqueue pending 9
unbound unbound 1101 51* kqueue pending 9
unbound unbound 1101 52* kqueue pending 17
unbound unbound 1101 53* kqueue pending 10
unbound unbound 1101 54* kqueue pending 8
unbound unbound 1101 55* kqueue pending 9
unbound unbound 1101 56* kqueue pending 9
unbound unbound 1101 57* internet dgram udp *:35674
unbound unbound 1101 58* internet dgram udp *:10637
unbound unbound 1101 59* internet dgram udp *:30490
unbound unbound 1101 60* internet dgram udp *:19156
unbound unbound 1101 61* internet dgram udp *:31394
unbound unbound 1101 62* internet dgram udp *:32203
unbound unbound 1101 63* internet dgram udp *:63713
unbound unbound 1101 64* unix stream <-> /var/run/unbound-ctl [creat]
unbound unbound 1101 65* internet dgram udp *:36459
unbound unbound 1101 66* internet dgram udp *:56616
unbound unbound 1101 67* internet dgram udp *:21417
unbound unbound 1101 68* internet dgram udp *:57035
unbound unbound 1101 69* internet dgram udp *:22686
unbound unbound 1101 70* internet dgram udp *:49669
unbound unbound 1101 71* internet stream tcp
unbound unbound 1101 72* internet6 dgram udp *:53378
unbound unbound 1101 73* internet6 dgram udp *:20212
unbound unbound 1101 74* internet stream tcp
unbound unbound 1101 76* internet dgram udp *:9932
unbound unbound 1101 77* internet dgram udp *:12556
unbound unbound 1101 78* internet dgram udp *:34919
unbound unbound 1101 79* internet dgram udp *:64843
unbound unbound 1101 80* internet dgram udp *:43554
unbound unbound 1101 81* internet dgram udp *:52510
unbound unbound 1101 82* internet stream tcp
unbound unbound 1101 83* internet dgram udp *:43543
unbound unbound 1101 85* internet6 dgram udp *:18430
unbound unbound 1101 86* internet dgram udp *:11819
unbound unbound 1101 87* internet dgram udp *:6392
unbound unbound 1101 88* internet dgram udp *:62296
unbound unbound 1101 89* internet dgram udp *:65501
unbound unbound 1101 90* internet dgram udp *:15730
unbound unbound 1101 91* internet dgram udp *:8304
unbound unbound 1101 92* internet dgram udp *:31388
unbound unbound 1101 93* internet dgram udp *:24339
unbound unbound 1101 94* internet6 dgram udp *:37859
unbound unbound 1101 95* internet6 dgram udp *:42976
unbound unbound 1101 96* internet dgram udp *:53401
unbound unbound 1101 97* internet dgram udp *:36576
unbound unbound 1101 98* internet dgram udp *:12451
unbound unbound 1101 100* internet dgram udp *:26113
unbound unbound 1101 102* internet dgram udp *:41035
unbound unbound 1101 114* internet6 dgram udp *:14168
>How-To-Repeat:
Run unbound on 7.0 with relatively light load (a few hundred
queries/sec). Monitor it with collectd, watch it periodically
get stuck in this manner.
>Fix:
Sorry, don't know. Hints for debugging this further
gratefully accepted.
Home |
Main Index |
Thread Index |
Old Index