Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
problem with httpd hang
Hello,
for some time I have an issue with apache (2.2) hangs on
ftp.fr.netbsd.org (running a recent 7.0_BETA). When this happens,
port 80 is still open and accepts connections, but requests are not handled.
This seems to be because httpd doesn't do anything more, and zombies
are not properly reaped:
antioche:/home/bouyer>ps axuww |grep http
www 1428 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
root 1649 0.0 0.1 74892 5368 ? Is 1:23PM 0:17.99 /usr/pkg/sbin/httpd -k start
www 2374 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 4303 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 5076 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 6364 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 6642 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 9104 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 9956 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 19196 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 21177 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 21193 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 22053 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 23249 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 29014 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
www 29271 0.0 0.0 0 0 ? Z - 0:00.00 (httpd)
antioche:/home/bouyer>ps axlww | grep http
98 1428 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
0 1649 1 0 85 0 74892 5368 pipe_wr Is ? 0:17.99 /usr/pkg/sbin/httpd -k start
98 2374 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 4303 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 5076 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 6364 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 6642 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 9104 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 9956 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 19196 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 21177 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 21193 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 22053 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 23249 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 29014 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
98 29271 1649 0 0 0 0 0 - Z ? 0:00.00 (httpd)
It seems to be stuck on a pipe write. fstat tells me:
antioche:/home/bouyer>fstat -p 1649
USER CMD PID FD MOUNT INUM MODE SZ|DV R/W
root httpd 1649 wd / 2 drwxr-xr-x 512 r
root httpd 1649 0 / 51846 crw-rw-rw- null r
root httpd 1649 1 / 51846 crw-rw-rw- null w
root httpd 1649 2 /var 42803 -rw-r--r-- 131768080 w
root httpd 1649 3* crypto 0xfffffe810efad7e0
root httpd 1649 4* internet stream tcp *:http
root httpd 1649 5* internet6 stream tcp *:http
root httpd 1649 6* crypto 0xfffffe810efad888
root httpd 1649 7 flags 0x80034<ISTTY,MPSAFE,LOCKSWORK,CLEAN>
root httpd 1649 8 flags 0x80034<ISTTY,MPSAFE,LOCKSWORK,CLEAN>
root httpd 1649 9* pipe 0xfffffe810f783dc0 -> 0x0 w
root httpd 1649 10 / 41817 -rw-r--r-- 53 r
root httpd 1649 11* pipe 0xfffffe81da43aa28 <- 0xfffffe81d83a03e8 rn
root httpd 1649 12* pipe 0xfffffe81d83a03e8 -> 0xfffffe81da43aa28 w
root httpd 1649 13 /var 42823 -rw-r--r-- 2358066 w
root httpd 1649 14 /var 42807 -rw-r--r-- 3449145 w
root httpd 1649 15 /var 42807 -rw-r--r-- 3449145 w
root httpd 1649 16 /var 42901 -rw-r--r-- 121307 w
to 2 pipes open in write mode: one with no more readers, and one
with only a single reader left, process 1649 itself.
I don't know if it's writing to 0xfffffe810f783dc0 and failed to get
a SIGPIPE, or if it's writing to 0xfffffe81d83a03e8 (in which case
it's a real deadlock, I guess httpd failed to properly close the read
end of its pipe after fork). It seems to be inherited by all child httpd
and maybe it's used for communications bewteen master and slaves.
The pipe on fd 9 is common to all processes started from /etc/rc.
gdb tells me about process 1649:
[Switching to LWP 1]
0x00007f7ff583c02a in write () from /usr/lib/libc.so.12
(gdb) where
#0 0x00007f7ff583c02a in write () from /usr/lib/libc.so.12
#1 0x00007f7ff6007388 in write () from /usr/lib/libpthread.so.1
#2 0x00007f7ff6c1832f in apr_file_write () from /usr/pkg/lib/libapr-1.so.0
#3 0x000000000044b7ce in pod_signal_internal ()
#4 0x000000000044ba75 in ap_mpm_pod_signal ()
#5 0x0000000000458ac6 in perform_idle_server_maintenance ()
#6 0x00000000004590b8 in ap_mpm_run ()
#7 0x0000000000425d7d in main ()
Attaching to the process with gdb and then exiting from gdb did un-hang the
process, it's working again. The pipes on fd 9, 11 and 12 are still there.
Any idea how to debug this further ?
This has been running fine for months, and started happening only a few days
ago (it's not related to an upgrade, I upgraded only to see if this would
fix the problem), I guess something in the load pattern has changed.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index