tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [Fwd: Re: dccifd: restart after signal 6]



In article <20090607093815.GA4811%pintail.smokva.net@localhost>,
Petar Bogdanovic  <petar%smokva.net@localhost> wrote:

How many years will it take people to switch from res_foo to res_nfoo
which is re-entrant? Really, these functions have been there for
more than a decade. There is no excuse for a multi-threaded program
not to use them if they are available and instead use their own
API and do their own locking. What is next? Don't use the _r functions
and do your own locking? Add mutexes to protect a possibly non-thread-safe
malloc? Unless of course I am missing something, and if so I apologize.
99% of the programs that use _res in a multi-threaded environment do
so incorrectly, and it is a good thing to make them use the new API's.
Someone made a mistake exposing _res a *long* time ago. Let's not
perpetuate it by having new code try to use it.

christos

>Hi,
>
>attached is an excerpt from the [1]DCC mailing list.  One of the dcc
>daemons started to die unexpectedly because it used _res in conjunction
>with threads which seems to be a weird combination on NetBSD even if you
>assume that your resolver is not thread safe and lock him properly:
>
>/usr/src/include/resolv.h:
>       /*
>        * Source and Binary compatibility; _res will not work properly
>        * with multi-threaded programs.
>        */
>       extern struct __res_state *__res_state(void);
>       #define _res (*__res_state())
>
>/usr/src/lib/libpthread/res_state.c:
>       /*
>        * This is aliased via a macro to _res; don't allow multi-threaded 
> programs
>        * to use it.
>        */
>       res_state
>       __res_state(void)
>       {
>               static const char res[] = "_res is not supported for 
> multi-threaded"
>                   " programs.\n";
>               (void)write(STDERR_FILENO, res, sizeof(res) - 1);
>               abort();
>               return NULL;
>       }
>
>
>At least that's how I interpreted the story so it would be nice if
>somebody here could analyze/comment the issue.
>
>Thank you,
>
>
>
>   Petar Bogdanovic
>
>
>
>[1] http://www.rhyolite.com/dcc/
>
>----- Forwarded message from Vernon Schryver 
><vjs%calcite.rhyolite.com@localhost> -----
>
>Date: Fri, 5 Jun 2009 21:52:05 GMT
>From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>To: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost
>
>> From: MrC <lists-dcc%cappella.us@localhost>
>
>> As expected, this is the result of a kill(2) call.
>>
>> #0  0xbbaf923f in kill () from /usr/lib/libc.so.12
>> #1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
>> #2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
>> #3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
>> #4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000, 
>> clnt_fgs=8 '\b') at clnt_send.c:1740
>> #5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
>> #6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
>> #7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12
>
>Now that you mention it, I saw an instance of it a week or two ago,
>but hoped it was a fluke.  I've been unable to reproduce it then or today.
>
>That's an ugly one, because it's not in my code.
>This is the relevant part of my get_port.c:
>
>       if (!dcc_host_locked)
>               dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");
>
>       /* get the current value */
>       if (!(_res.options & RES_INIT))
>               res_init();
>
>dcc_logbad() calls abort() after syslog().
>Because I assume the resolver is not thread safe and check that it's
>locked, it can't be a simple, valid locking problem.
>
>I guess I'll have to look for NetBSD's version of the resolver library
>to see what NetBSD has done to it.  There are no abort() calls in the
>FreeBSD 7.1 version of res_state.c
>
>
>Have I mentioned that I'm not a fan of the clean target in
>the NetBSD bsd.prog.mk because it deletes .gdbinit?
>
>
>thanks,
>Vernon Schryver    vjs%rhyolite.com@localhost
>_______________________________________________
>DCC mailing list      DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>----- Forwarded message from MrC <lists-dcc%cappella.us@localhost> -----
>
>Date: Fri, 05 Jun 2009 16:16:23 -0700
>From: MrC <lists-dcc%cappella.us@localhost>
>To: dcc%rhyolite.com@localhost
>
>
>
>On 6/5/2009 2:52 PM, Vernon Schryver wrote:
>>> From: MrC<lists-dcc%cappella.us@localhost>
>>
>>> As expected, this is the result of a kill(2) call.
>>>
>>> #0  0xbbaf923f in kill () from /usr/lib/libc.so.12
>>> #1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
>>> #2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
>>> #3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
>>> #4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
>>> clnt_fgs=8 '\b') at clnt_send.c:1740
>>> #5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
>>> #6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
>>> #7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12
>>
>> Now that you mention it, I saw an instance of it a week or two ago,
>> but hoped it was a fluke.  I've been unable to reproduce it then or today.
>>
>> That's an ugly one, because it's not in my code.
>> This is the relevant part of my get_port.c:
>>
>>      if (!dcc_host_locked)
>>              dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");
>>
>>      /* get the current value */
>>      if (!(_res.options&  RES_INIT))
>>              res_init();
>>
>> dcc_logbad() calls abort() after syslog().
>> Because I assume the resolver is not thread safe and check that it's
>> locked, it can't be a simple, valid locking problem.
>>
>> I guess I'll have to look for NetBSD's version of the resolver library
>> to see what NetBSD has done to it.  There are no abort() calls in the
>> FreeBSD 7.1 version of res_state.c
>>
>
>I don't see anything that immediately jumps out between changes in 4.0 and 
>4.0.1, but I'm long since familiar with libc code.
>
>
>Here's the abort() in the libpthread version of res_state:
>
>libpthread/res_state.c
>
>    /*
>* This is aliased via a macro to _res; don't allow multi-threaded programs
>     * to use it.
>     */
>    res_state
>    __res_state(void)
>    {
>static const char res[] = "_res is not supported for multi-threaded"
>                " programs.\n";
>            (void)write(STDERR_FILENO, res, sizeof(res) - 1);
>            abort();
>            return NULL;
>    }
>
>
>The libc version uses weak aliases:
>
>    #include <sys/cdefs.h>
>    #if defined(LIBC_SCCS) && !defined(lint)
>__RCSID("$NetBSD: res_state.c,v 1.5.10.1 2007/05/17 21:25:19 jdc Exp $");
>    #endif
>
>    #include <sys/types.h>
>    #include <arpa/inet.h>
>    #include <arpa/nameser.h>
>    #include <netdb.h>
>    #include <resolv.h>
>
>    struct __res_state _nres
>    # if defined(__BIND_RES_TEXT)
>            = { .retrans = RES_TIMEOUT, }   /*%< Motorola, et al. */
>    # endif
>            ;
>
>    res_state __res_get_state_nothread(void);
>    void __res_put_state_nothread(res_state);
>
>    #ifdef __weak_alias
>    __weak_alias(__res_get_state, __res_get_state_nothread)
>    __weak_alias(__res_put_state, __res_put_state_nothread)
>    /* Source compatibility; only for single threaded programs */
>    __weak_alias(__res_state, __res_get_state_nothread)
>    #endif
>
>    res_state
>    __res_get_state_nothread(void)
>    {
>if ((_nres.options & RES_INIT) == 0 && res_ninit(&_nres) == -1) {
>                    h_errno = NETDB_INTERNAL;
>                    return NULL;
>            }
>            return &_nres;
>    }
>
>And dccifd is linked against libpthread:
>
>    $ ldd /var/dcc/libexec/dccifd
>    /var/dcc/libexec/dccifd:
>            -lpthread.0 => /usr/lib/libpthread.so.0
>            -lm.0 => /usr/lib/libm387.so.0
>            -lm.0 => /usr/lib/libm.so.0
>            -lc.12 => /usr/lib/libc.so.12
>
>
>Let me know if there is something I can do to help.
>
>
>> Have I mentioned that I'm not a fan of the clean target in
>> the NetBSD bsd.prog.mk because it deletes .gdbinit?
>>
>
>Oh, boy.  They really mean *squeaky clean*.
>
>Thanks for your time,
>Mike
>_______________________________________________
>DCC mailing list      DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>----- Forwarded message from Vernon Schryver 
><vjs%calcite.rhyolite.com@localhost> -----
>
>Date: Sat, 6 Jun 2009 00:17:27 GMT
>From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>To: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost
>
>> From: MrC <lists-dcc%cappella.us@localhost>
>
>> Here's the abort() in the libpthread version of res_state:
>
>>              static const char res[] = "_res is not supported for 
>> multi-threaded"
>>                  " programs.\n";
>>              (void)write(STDERR_FILENO, res, sizeof(res) - 1);
>>              abort();
>>              return NULL;
>>      }
>
>That is utterly lame, overprotective mommy-ism.  Besides, it stupidly
>assumes that stderr has not been closed, as competent programmer does
>in a daemon.
>
>A competent programmer knows to suspect that a library is not thread
>safe unless it explicitly says it is.  More than that, you assume that
>a library that keeps internal state like _res.options and res_init()
>is not thread safe unless it both claims to be thread safe and you can't
>find any evidence about problems.
>
>And to omit any hint of the nonsense in the resolver man page!
>There is this passing mention in resolve.h:
> * Source and Binary compatibility; _res will not work properly
> * with multi-threaded programs.
>
>
>> And dccifd is linked against libpthread:
>
>Because dccifd and dccm use threads
>
>
>> Let me know if there is something I can do to help.
>
>Any suggestions on the least nasty kludge to link dccifd and dccm
>to the libc resolver intead of the broken-by-design libpthread
>resolver?  I'd have to force not only the res_state hooks, but the
>whole resolver edifice including anything called inside gethostby*().
>
>Should I change the Makefiles to treat NetBSD like Windows and not build
>dccifd and dccm under the toy-applications-for-toy-operating-systems rule?
>
>Maybe I can arrange to not tweak the resolver timeouts for the threaded
>DCC programs to limit the total DCC delays and so keep SpamAssassin and
>MTAs from being unhappy.
>
>
>Have I mentioned I'm becoming ever less enamored of recent versions of
>NetBSD?  The Linux experts only gratuitously, incompatibly changed the
>names of the resolver hooks.
>
>
>thanks any way,
>Vernon Schryver    vjs%rhyolite.com@localhost
>_______________________________________________
>DCC mailing list      DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>----- Forwarded message from Paul Vixie <vixie%isc.org@localhost> -----
>
>Date: Sat, 06 Jun 2009 00:49:55 +0000
>From: Paul Vixie <vixie%isc.org@localhost>
>To: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>Cc: lists-dcc%cappella.us@localhost, dcc%rhyolite.com@localhost
>
>the strange part of all this is, there has been a thread safe superset
>of libresolv for about 11 years.  just call res_ninit instead of
>res_init, and res_nsend instead of res_send.  netbsd doesn't know this?
>_______________________________________________
>DCC mailing list      DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>----- Forwarded message from Vernon Schryver 
><vjs%calcite.rhyolite.com@localhost> -----
>
>Date: Sat, 6 Jun 2009 01:02:56 GMT
>From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>To: dcc%calcite.rhyolite.com@localhost
>
>> From: Paul Vixie <vixie%isc.org@localhost>
>> To: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
>> cc: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost
>
>> the strange part of all this is, there has been a thread safe superset
>> of libresolv for about 11 years.  just call res_ninit instead of
>> res_init, and res_nsend instead of res_send.  netbsd doesn't know this?
>
>I didn't know that, but now that you mention it, it's in non-NetBSD
>versions of res_state.c, and so probably in NetBSD versions.
>I'd not trust it without reading the whole resolver library because
>"thread" does appear in res_state.c.  But there are plenty of
>pthread.h and similar stains in other files in the
>FreeBSD version of /usr/src/lib/libc/resolv/*
>
>
>While I'm ranting about NetBSD, I wish they'd get with the program in
>minor ways that wouldn't break anything.  I can't think of an excuse
>for the spews of compiler warnings like these:
>    sign.c:77: warning: pointer targets in passing argument 2 of
>'MD5Update' differ in signedness
>Declaring your MD5Update() to take a const void * instead of a const
>unsigned char * should be done even before you tune it.  It should be
>done when you add "const" to the ancient code from the RFC 1321.
>
>
>Vernon Schryver    vjs%rhyolite.com@localhost
>_______________________________________________
>DCC mailing list      DCC%rhyolite.com@localhost
>http://www.rhyolite.com/mailman/listinfo/dcc
>
>----- End forwarded message -----
>





Home | Main Index | Thread Index | Old Index