Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

File descriptor leak involving kqueue



Lately, I'm running my postfix with opendkim and opendmarc milters (both
from pkgsrc).  Something about opendmarc is bleeding the system empty of
file handles, and I'd appreciate some help thinking about how to find
out exactly what's going on.

fstat shows me this:

opendmar opendmarc  12624   wd /          64910 drwxr-xr-x    1024 r
opendmar opendmarc  12624    0 /          43280 crw-rw-rw-    null rw
opendmar opendmarc  12624    1 /          43280 crw-rw-rw-    null rw
opendmar opendmarc  12624    2 /          43280 crw-rw-rw-    null rw
opendmar opendmarc  12624    3* unix dgram  <-> /var/run/log [using]
opendmar opendmarc  12624    4* internet stream tcp localhost:ddi-tcp-6
opendmar opendmarc  12624    5* internet stream tcp localhost:52067 <-> localhost:ddi-tcp-6
opendmar opendmarc  12624    6* kqueue pending 0
opendmar opendmarc  12624    7* kqueue pending 0
opendmar opendmarc  12624    8* kqueue pending 0
opendmar opendmarc  12624    9* kqueue pending 0
opendmar opendmarc  12624   10* kqueue pending 0
opendmar opendmarc  12624   11* kqueue pending 0
opendmar opendmarc  12624   12* kqueue pending 0

...and then the list of kqueue lines just grows and grows over time.

Doing a ktrace on the process seems to indicate that the kqueue use has
to do with the resolver.  The pattern is like this:

  7013     31 opendmarc CALL  open(0x73e95572c060,0x400000,0x1b6)
  7013     31 opendmarc NAMI  "/etc/resolv.conf"
  7013     31 opendmarc RET   open 9
  7013     31 opendmarc CALL  __fstat50(9,0x73e9522d9c00)
  7013     31 opendmarc RET   __fstat50 0
  7013     31 opendmarc CALL  read(9,0x73e950d37000,0x4000)
  7013     31 opendmarc GIO   fd 9 read 47 bytes
       "nameserver 193.71.27.8\nsearch hamartun.priv.no\n"
  7013     31 opendmarc RET   read 47/0x2f
  7013     31 opendmarc CALL  read(9,0x73e950d37000,0x4000)
  7013     31 opendmarc GIO   fd 9 read 0 bytes
       ""
  7013     31 opendmarc RET   read 0
  7013     31 opendmarc CALL  fcntl(9,F_DUPFD_CLOEXEC,0x73e95613d000)
  7013     31 opendmarc RET   fcntl -1 errno 22 Invalid argument
  7013     31 opendmarc CALL  close(9)
  7013     31 opendmarc RET   close 0
  7013     31 opendmarc CALL  __fstat50(0xffffffff,0x73e9522d9e50)
  7013     31 opendmarc RET   __fstat50 -1 errno 9 Bad file descriptor
  7013     31 opendmarc CALL  kqueue1(0x400000)
  7013     31 opendmarc RET   kqueue1 9
  7013     31 opendmarc CALL  __kevent50(9,0x73e9522d9e20,1,0,0,0x73e95572c080)
  7013     31 opendmarc RET   __kevent50 -1 errno 9 Bad file descriptor
  7013     31 opendmarc CALL  __gettimeofday50(0x73e9522d9e50,0)
  7013     31 opendmarc RET   __gettimeofday50 0
  7013     31 opendmarc CALL  __kevent50(9,0,0,0x73e9522d9860,1,0x73e95572c080)
  7013     31 opendmarc RET   __kevent50 0
  7013     31 opendmarc CALL  __socket30(2,0x10000002,0)
  7013     31 opendmarc RET   __socket30 10/0xa
  7013     31 opendmarc CALL  connect(0xa,0x73e9522da6d4,0x10)
  7013     31 opendmarc RET   connect 0
  7013     31 opendmarc CALL  sendto(0xa,0x73e9522da010,0x24,0,0,0)
  7013     31 opendmarc MISC  msghdr: [name=0x0, namelen=0, iov=0xfffffe81015b8e50, iovlen=1, control=0x0, controllen=2166179032, flags=0]
  7013     31 opendmarc GIO   fd 10 wrote 36 bytes
       "\^]I\^A\0\0\^A\0\0\0\0\0\0\^F_dmarc\akeithf4\^Ccom\0\0\^P\0\^A"
  7013     31 opendmarc RET   sendto 36/0x24
  7013     31 opendmarc CALL  __clock_gettime50(0,0x73e9522d9890)
  7013     31 opendmarc RET   __clock_gettime50 0
  7013     31 opendmarc CALL  poll(0x73e9522d9970,1,0x1388)
  7013     31 opendmarc RET   poll 1
  7013     31 opendmarc CALL  recvfrom(0xa,0x73e9522da8f0,0x2000,0,0x73e9522d9990,0x73e9522d9964)
  7013     31 opendmarc MISC  msghdr: [name=0x0, namelen=29, iov=0xfffffe81015b8e40, iovlen=1, control=0x0, controllen=3021220544, flags=0]
  7013     31 opendmarc GIO   fd 10 read 100 bytes
       "\^]I\M^A\M^C\0\^A\0\0\0\^A\0\0\^F_dmarc\akeithf4\^Ccom\0\0\^P\0\^A\M-@\^S\0\^F\0\^A\0\0\a\b\0004\^Cns1\fdigita\
	locean\M-@\^[\nhostmaster\M-@\^SW\M^X\^XC\0\0*0\0\0\^N\^P\0	:\M^@\0\0\a\b"
  7013     31 opendmarc MISC  mbsoname: [193.71.27.8]
  7013     31 opendmarc RET   recvfrom 100/0x64
  7013     31 opendmarc CALL  close(0xa)
  7013     31 opendmarc RET   close 0
  7013     31 opendmarc CALL  __gettimeofday50(0x73e9522d9da0,0)
  7013     31 opendmarc RET   __gettimeofday50 0
  7013     31 opendmarc CALL  getpid
  7013     31 opendmarc RET   getpid 7013/0x1b65, 1
  7013     31 opendmarc CALL  __gettimeofday50(0x73e9522d9d30,0)
  7013     31 opendmarc RET   __gettimeofday50 0

...and then it reads /etc/resolv.conf again, to do another lookup.  This
time, the kqueue call returns fd 10 (which has just been closed, after
being used for the socket to talk to the name server).  Next time this
happens, we get 11, then 12, and so on...

It looks like the fd returned by the kqueue system call is never
closed.  I'm guessing /usr/src/lib/libc/resolv/res_send.c is where the
action is, but it's strange that opendmarc in particular is exhibiting
the problem.  On the other hand, it's doing a *lot* of resolving.  A bug
that causes the kqueue fd to be left hanging in certain situations?

Anyone recognize this pattern?

I'm running current as of two months ago.

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Home | Main Index | Thread Index | Old Index