tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: [Fwd: Re: dccifd: restart after signal 6]
In article <20090607093815.GA4811%pintail.smokva.net@localhost>,
Petar Bogdanovic <petar%smokva.net@localhost> wrote:
How many years will it take people to switch from res_foo to res_nfoo
which is re-entrant? Really, these functions have been there for
more than a decade. There is no excuse for a multi-threaded program
not to use them if they are available and instead use their own
API and do their own locking. What is next? Don't use the _r functions
and do your own locking? Add mutexes to protect a possibly non-thread-safe
malloc? Unless of course I am missing something, and if so I apologize.
99% of the programs that use _res in a multi-threaded environment do
so incorrectly, and it is a good thing to make them use the new API's.
Someone made a mistake exposing _res a *long* time ago. Let's not
perpetuate it by having new code try to use it.
christos
>Hi,
>
>attached is an excerpt from the [1]DCC mailing list. One of the dcc
>daemons started to die unexpectedly because it used _res in conjunction
>with threads which seems to be a weird combination on NetBSD even if you
>assume that your resolver is not thread safe and lock him properly:
>
>/usr/src/include/resolv.h:
> /*
> * Source and Binary compatibility; _res will not work properly
> * with multi-threaded programs.
> */
> extern struct __res_state *__res_state(void);
> #define _res (*__res_state())
>
>/usr/src/lib/libpthread/res_state.c:
> /*
> * This is aliased via a macro to _res; don't allow multi-threaded
> programs
> * to use it.
> */
> res_state
> __res_state(void)
> {
> static const char res[] = "_res is not supported for
> multi-threaded"
> " programs.\n";
> (void)write(STDERR_FILENO, res, sizeof(res) - 1);
> abort();
> return NULL;
> }
>
>
>At least that's how I interpreted the story so it would be nice if
>somebody here could analyze/comment the issue.
>
>Thank you,
>
>
>
> Petar Bogdanovic
>
>
>
>[1] http://www.rhyolite.com/dcc/
>
>----- Forwarded message from Vernon Schryver
><vjs%calcite.rhyolite.com@localhost> -----
>
>Date: Fri, 5 Jun 2009 21:52:05 GMT
>From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>To: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost
>
>> From: MrC <lists-dcc%cappella.us@localhost>
>
>> As expected, this is the result of a kill(2) call.
>>
>> #0 0xbbaf923f in kill () from /usr/lib/libc.so.12
>> #1 0xbbb95a64 in abort () from /usr/lib/libc.so.12
>> #2 0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
>> #3 0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
>> #4 0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
>> clnt_fgs=8 '\b') at clnt_send.c:1740
>> #5 0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
>> #6 0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
>> #7 0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12
>
>Now that you mention it, I saw an instance of it a week or two ago,
>but hoped it was a fluke. I've been unable to reproduce it then or today.
>
>That's an ugly one, because it's not in my code.
>This is the relevant part of my get_port.c:
>
> if (!dcc_host_locked)
> dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");
>
> /* get the current value */
> if (!(_res.options & RES_INIT))
> res_init();
>
>dcc_logbad() calls abort() after syslog().
>Because I assume the resolver is not thread safe and check that it's
>locked, it can't be a simple, valid locking problem.
>
>I guess I'll have to look for NetBSD's version of the resolver library
>to see what NetBSD has done to it. There are no abort() calls in the
>FreeBSD 7.1 version of res_state.c
>
>
>Have I mentioned that I'm not a fan of the clean target in
>the NetBSD bsd.prog.mk because it deletes .gdbinit?
>
>
>thanks,
>Vernon Schryver vjs%rhyolite.com@localhost
>_______________________________________________
>DCC mailing list DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>----- Forwarded message from MrC <lists-dcc%cappella.us@localhost> -----
>
>Date: Fri, 05 Jun 2009 16:16:23 -0700
>From: MrC <lists-dcc%cappella.us@localhost>
>To: dcc%rhyolite.com@localhost
>
>
>
>On 6/5/2009 2:52 PM, Vernon Schryver wrote:
>>> From: MrC<lists-dcc%cappella.us@localhost>
>>
>>> As expected, this is the result of a kill(2) call.
>>>
>>> #0 0xbbaf923f in kill () from /usr/lib/libc.so.12
>>> #1 0xbbb95a64 in abort () from /usr/lib/libc.so.12
>>> #2 0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
>>> #3 0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
>>> #4 0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
>>> clnt_fgs=8 '\b') at clnt_send.c:1740
>>> #5 0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
>>> #6 0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
>>> #7 0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12
>>
>> Now that you mention it, I saw an instance of it a week or two ago,
>> but hoped it was a fluke. I've been unable to reproduce it then or today.
>>
>> That's an ugly one, because it's not in my code.
>> This is the relevant part of my get_port.c:
>>
>> if (!dcc_host_locked)
>> dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");
>>
>> /* get the current value */
>> if (!(_res.options& RES_INIT))
>> res_init();
>>
>> dcc_logbad() calls abort() after syslog().
>> Because I assume the resolver is not thread safe and check that it's
>> locked, it can't be a simple, valid locking problem.
>>
>> I guess I'll have to look for NetBSD's version of the resolver library
>> to see what NetBSD has done to it. There are no abort() calls in the
>> FreeBSD 7.1 version of res_state.c
>>
>
>I don't see anything that immediately jumps out between changes in 4.0 and
>4.0.1, but I'm long since familiar with libc code.
>
>
>Here's the abort() in the libpthread version of res_state:
>
>libpthread/res_state.c
>
> /*
>* This is aliased via a macro to _res; don't allow multi-threaded programs
> * to use it.
> */
> res_state
> __res_state(void)
> {
>static const char res[] = "_res is not supported for multi-threaded"
> " programs.\n";
> (void)write(STDERR_FILENO, res, sizeof(res) - 1);
> abort();
> return NULL;
> }
>
>
>The libc version uses weak aliases:
>
> #include <sys/cdefs.h>
> #if defined(LIBC_SCCS) && !defined(lint)
>__RCSID("$NetBSD: res_state.c,v 1.5.10.1 2007/05/17 21:25:19 jdc Exp $");
> #endif
>
> #include <sys/types.h>
> #include <arpa/inet.h>
> #include <arpa/nameser.h>
> #include <netdb.h>
> #include <resolv.h>
>
> struct __res_state _nres
> # if defined(__BIND_RES_TEXT)
> = { .retrans = RES_TIMEOUT, } /*%< Motorola, et al. */
> # endif
> ;
>
> res_state __res_get_state_nothread(void);
> void __res_put_state_nothread(res_state);
>
> #ifdef __weak_alias
> __weak_alias(__res_get_state, __res_get_state_nothread)
> __weak_alias(__res_put_state, __res_put_state_nothread)
> /* Source compatibility; only for single threaded programs */
> __weak_alias(__res_state, __res_get_state_nothread)
> #endif
>
> res_state
> __res_get_state_nothread(void)
> {
>if ((_nres.options & RES_INIT) == 0 && res_ninit(&_nres) == -1) {
> h_errno = NETDB_INTERNAL;
> return NULL;
> }
> return &_nres;
> }
>
>And dccifd is linked against libpthread:
>
> $ ldd /var/dcc/libexec/dccifd
> /var/dcc/libexec/dccifd:
> -lpthread.0 => /usr/lib/libpthread.so.0
> -lm.0 => /usr/lib/libm387.so.0
> -lm.0 => /usr/lib/libm.so.0
> -lc.12 => /usr/lib/libc.so.12
>
>
>Let me know if there is something I can do to help.
>
>
>> Have I mentioned that I'm not a fan of the clean target in
>> the NetBSD bsd.prog.mk because it deletes .gdbinit?
>>
>
>Oh, boy. They really mean *squeaky clean*.
>
>Thanks for your time,
>Mike
>_______________________________________________
>DCC mailing list DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>----- Forwarded message from Vernon Schryver
><vjs%calcite.rhyolite.com@localhost> -----
>
>Date: Sat, 6 Jun 2009 00:17:27 GMT
>From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>To: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost
>
>> From: MrC <lists-dcc%cappella.us@localhost>
>
>> Here's the abort() in the libpthread version of res_state:
>
>> static const char res[] = "_res is not supported for
>> multi-threaded"
>> " programs.\n";
>> (void)write(STDERR_FILENO, res, sizeof(res) - 1);
>> abort();
>> return NULL;
>> }
>
>That is utterly lame, overprotective mommy-ism. Besides, it stupidly
>assumes that stderr has not been closed, as competent programmer does
>in a daemon.
>
>A competent programmer knows to suspect that a library is not thread
>safe unless it explicitly says it is. More than that, you assume that
>a library that keeps internal state like _res.options and res_init()
>is not thread safe unless it both claims to be thread safe and you can't
>find any evidence about problems.
>
>And to omit any hint of the nonsense in the resolver man page!
>There is this passing mention in resolve.h:
> * Source and Binary compatibility; _res will not work properly
> * with multi-threaded programs.
>
>
>> And dccifd is linked against libpthread:
>
>Because dccifd and dccm use threads
>
>
>> Let me know if there is something I can do to help.
>
>Any suggestions on the least nasty kludge to link dccifd and dccm
>to the libc resolver intead of the broken-by-design libpthread
>resolver? I'd have to force not only the res_state hooks, but the
>whole resolver edifice including anything called inside gethostby*().
>
>Should I change the Makefiles to treat NetBSD like Windows and not build
>dccifd and dccm under the toy-applications-for-toy-operating-systems rule?
>
>Maybe I can arrange to not tweak the resolver timeouts for the threaded
>DCC programs to limit the total DCC delays and so keep SpamAssassin and
>MTAs from being unhappy.
>
>
>Have I mentioned I'm becoming ever less enamored of recent versions of
>NetBSD? The Linux experts only gratuitously, incompatibly changed the
>names of the resolver hooks.
>
>
>thanks any way,
>Vernon Schryver vjs%rhyolite.com@localhost
>_______________________________________________
>DCC mailing list DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>----- Forwarded message from Paul Vixie <vixie%isc.org@localhost> -----
>
>Date: Sat, 06 Jun 2009 00:49:55 +0000
>From: Paul Vixie <vixie%isc.org@localhost>
>To: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>Cc: lists-dcc%cappella.us@localhost, dcc%rhyolite.com@localhost
>
>the strange part of all this is, there has been a thread safe superset
>of libresolv for about 11 years. just call res_ninit instead of
>res_init, and res_nsend instead of res_send. netbsd doesn't know this?
>_______________________________________________
>DCC mailing list DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>----- Forwarded message from Vernon Schryver
><vjs%calcite.rhyolite.com@localhost> -----
>
>Date: Sat, 6 Jun 2009 01:02:56 GMT
>From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>To: dcc%calcite.rhyolite.com@localhost
>
>> From: Paul Vixie <vixie%isc.org@localhost>
>> To: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>> cc: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost
>
>> the strange part of all this is, there has been a thread safe superset
>> of libresolv for about 11 years. just call res_ninit instead of
>> res_init, and res_nsend instead of res_send. netbsd doesn't know this?
>
>I didn't know that, but now that you mention it, it's in non-NetBSD
>versions of res_state.c, and so probably in NetBSD versions.
>I'd not trust it without reading the whole resolver library because
>"thread" does appear in res_state.c. But there are plenty of
>pthread.h and similar stains in other files in the
>FreeBSD version of /usr/src/lib/libc/resolv/*
>
>
>While I'm ranting about NetBSD, I wish they'd get with the program in
>minor ways that wouldn't break anything. I can't think of an excuse
>for the spews of compiler warnings like these:
> sign.c:77: warning: pointer targets in passing argument 2 of
>'MD5Update' differ in signedness
>Declaring your MD5Update() to take a const void * instead of a const
>unsigned char * should be done even before you tune it. It should be
>done when you add "const" to the ancient code from the RFC 1321.
>
>
>Vernon Schryver vjs%rhyolite.com@localhost
>_______________________________________________
>DCC mailing list DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>
Home |
Main Index |
Thread Index |
Old Index