Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: problem with httpd hang



On Fri, Feb 13, 2015 at 12:36:39PM +0100, Manuel Bouyer wrote:
> Hello,
> for some time I have an issue with apache (2.2) hangs on
> ftp.fr.netbsd.org (running a recent 7.0_BETA). When this happens,
> port 80 is still open and accepts connections, but requests are not handled.
> This seems to be because httpd doesn't do anything more, and zombies
> are not properly reaped:
> [...]
> It seems to be stuck on a pipe write. fstat tells me:
> antioche:/home/bouyer>fstat -p 1649
> USER     CMD          PID   FD MOUNT       INUM MODE         SZ|DV R/W
> root     httpd       1649   wd /              2 drwxr-xr-x     512 r 
> root     httpd       1649    0 /          51846 crw-rw-rw-    null r 
> root     httpd       1649    1 /          51846 crw-rw-rw-    null w 
> root     httpd       1649    2 /var       42803 -rw-r--r--  131768080 w 
> root     httpd       1649    3* crypto 0xfffffe810efad7e0
> root     httpd       1649    4* internet stream tcp *:http
> root     httpd       1649    5* internet6 stream tcp *:http
> root     httpd       1649    6* crypto 0xfffffe810efad888
> root     httpd       1649    7 flags 0x80034<ISTTY,MPSAFE,LOCKSWORK,CLEAN>
> root     httpd       1649    8 flags 0x80034<ISTTY,MPSAFE,LOCKSWORK,CLEAN>
> root     httpd       1649    9* pipe 0xfffffe810f783dc0 -> 0x0 w
> root     httpd       1649   10 /          41817 -rw-r--r--      53 r 
> root     httpd       1649   11* pipe 0xfffffe81da43aa28 <- 0xfffffe81d83a03e8 rn
> root     httpd       1649   12* pipe 0xfffffe81d83a03e8 -> 0xfffffe81da43aa28 w
> root     httpd       1649   13 /var       42823 -rw-r--r--  2358066 w 
> root     httpd       1649   14 /var       42807 -rw-r--r--  3449145 w 
> root     httpd       1649   15 /var       42807 -rw-r--r--  3449145 w 
> root     httpd       1649   16 /var       42901 -rw-r--r--  121307 w 
> 
> to 2 pipes open in write mode: one with no more readers, and one
> with only a single reader left, process 1649 itself.
> I don't know if it's writing to 0xfffffe810f783dc0 and failed to get
> a SIGPIPE, or if it's writing to 0xfffffe81d83a03e8 (in which case
> it's a real deadlock, I guess httpd failed to properly close the read
> end of its pipe after fork). It seems to be inherited by all child httpd
> and maybe it's used for communications bewteen master and slaves.
> The pipe on fd 9 is common to all processes started from /etc/rc.

I've got a ktrace of httpd (and childrens) up to the hang.
The hung write it to fd 12. apache is writing a single character
('!') here, which tells one of the child to gracefully exit.
It seems to be expected that both ends of the pipe are shared by the
parent and childrens (especially it's exected that the parent keeps the
read side of the pipe open).
What ktrace reveals is that the parent does a write(12, '!', 1) which
hangs, and then childrens all do a read(11, &c, 1), which reads 16 '!'
from the pipe. This doesn't cause the write() in the parent to
return, and this may be a problem. I don't know if standards says that
a single-char write to a pipe should unblock when a single character is
read from the other end of the pipe.

The ktrace counts 935 writes and only 868 reads (and 868 calls to fork), so
the root cause of the issue may be that the master process sends too many
'!' to childrens.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index