tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[Fwd: Re: dccifd: restart after signal 6]



Hi,

attached is an excerpt from the [1]DCC mailing list.  One of the dcc
daemons started to die unexpectedly because it used _res in conjunction
with threads which seems to be a weird combination on NetBSD even if you
assume that your resolver is not thread safe and lock him properly:

/usr/src/include/resolv.h:
        /*
         * Source and Binary compatibility; _res will not work properly
         * with multi-threaded programs.
         */
        extern struct __res_state *__res_state(void);
        #define _res (*__res_state())

/usr/src/lib/libpthread/res_state.c:
        /*
         * This is aliased via a macro to _res; don't allow multi-threaded 
programs
         * to use it.
         */
        res_state
        __res_state(void)
        {
                static const char res[] = "_res is not supported for 
multi-threaded"
                    " programs.\n";
                (void)write(STDERR_FILENO, res, sizeof(res) - 1);
                abort();
                return NULL;
        }


At least that's how I interpreted the story so it would be nice if
somebody here could analyze/comment the issue.

Thank you,



   Petar Bogdanovic



[1] http://www.rhyolite.com/dcc/

----- Forwarded message from Vernon Schryver 
<vjs%calcite.rhyolite.com@localhost> -----

Date: Fri, 5 Jun 2009 21:52:05 GMT
From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
To: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost

> From: MrC <lists-dcc%cappella.us@localhost>

> As expected, this is the result of a kill(2) call.
>
> #0  0xbbaf923f in kill () from /usr/lib/libc.so.12
> #1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
> #2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
> #3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
> #4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000, 
> clnt_fgs=8 '\b') at clnt_send.c:1740
> #5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
> #6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
> #7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12

Now that you mention it, I saw an instance of it a week or two ago,
but hoped it was a fluke.  I've been unable to reproduce it then or today.

That's an ugly one, because it's not in my code.
This is the relevant part of my get_port.c:

        if (!dcc_host_locked)
                dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");

        /* get the current value */
        if (!(_res.options & RES_INIT))
                res_init();

dcc_logbad() calls abort() after syslog().
Because I assume the resolver is not thread safe and check that it's
locked, it can't be a simple, valid locking problem.

I guess I'll have to look for NetBSD's version of the resolver library
to see what NetBSD has done to it.  There are no abort() calls in the
FreeBSD 7.1 version of res_state.c


Have I mentioned that I'm not a fan of the clean target in
the NetBSD bsd.prog.mk because it deletes .gdbinit?


thanks,
Vernon Schryver    vjs%rhyolite.com@localhost
_______________________________________________
DCC mailing list      DCC%rhyolite.com@localhost
http://www.rhyolite.com/mailman/listinfo/dcc

----- End forwarded message -----
----- Forwarded message from MrC <lists-dcc%cappella.us@localhost> -----

Date: Fri, 05 Jun 2009 16:16:23 -0700
From: MrC <lists-dcc%cappella.us@localhost>
To: dcc%rhyolite.com@localhost



On 6/5/2009 2:52 PM, Vernon Schryver wrote:
>> From: MrC<lists-dcc%cappella.us@localhost>
>
>> As expected, this is the result of a kill(2) call.
>>
>> #0  0xbbaf923f in kill () from /usr/lib/libc.so.12
>> #1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
>> #2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
>> #3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
>> #4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
>> clnt_fgs=8 '\b') at clnt_send.c:1740
>> #5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
>> #6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
>> #7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12
>
> Now that you mention it, I saw an instance of it a week or two ago,
> but hoped it was a fluke.  I've been unable to reproduce it then or today.
>
> That's an ugly one, because it's not in my code.
> This is the relevant part of my get_port.c:
>
>       if (!dcc_host_locked)
>               dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");
>
>       /* get the current value */
>       if (!(_res.options&  RES_INIT))
>               res_init();
>
> dcc_logbad() calls abort() after syslog().
> Because I assume the resolver is not thread safe and check that it's
> locked, it can't be a simple, valid locking problem.
>
> I guess I'll have to look for NetBSD's version of the resolver library
> to see what NetBSD has done to it.  There are no abort() calls in the
> FreeBSD 7.1 version of res_state.c
>

I don't see anything that immediately jumps out between changes in 4.0 and 
4.0.1, but I'm long since familiar with libc code.


Here's the abort() in the libpthread version of res_state:

libpthread/res_state.c

    /*
* This is aliased via a macro to _res; don't allow multi-threaded programs
     * to use it.
     */
    res_state
    __res_state(void)
    {
static const char res[] = "_res is not supported for multi-threaded"
                " programs.\n";
            (void)write(STDERR_FILENO, res, sizeof(res) - 1);
            abort();
            return NULL;
    }


The libc version uses weak aliases:

    #include <sys/cdefs.h>
    #if defined(LIBC_SCCS) && !defined(lint)
__RCSID("$NetBSD: res_state.c,v 1.5.10.1 2007/05/17 21:25:19 jdc Exp $");
    #endif

    #include <sys/types.h>
    #include <arpa/inet.h>
    #include <arpa/nameser.h>
    #include <netdb.h>
    #include <resolv.h>

    struct __res_state _nres
    # if defined(__BIND_RES_TEXT)
            = { .retrans = RES_TIMEOUT, }   /*%< Motorola, et al. */
    # endif
            ;

    res_state __res_get_state_nothread(void);
    void __res_put_state_nothread(res_state);

    #ifdef __weak_alias
    __weak_alias(__res_get_state, __res_get_state_nothread)
    __weak_alias(__res_put_state, __res_put_state_nothread)
    /* Source compatibility; only for single threaded programs */
    __weak_alias(__res_state, __res_get_state_nothread)
    #endif

    res_state
    __res_get_state_nothread(void)
    {
if ((_nres.options & RES_INIT) == 0 && res_ninit(&_nres) == -1) {
                    h_errno = NETDB_INTERNAL;
                    return NULL;
            }
            return &_nres;
    }

And dccifd is linked against libpthread:

    $ ldd /var/dcc/libexec/dccifd
    /var/dcc/libexec/dccifd:
            -lpthread.0 => /usr/lib/libpthread.so.0
            -lm.0 => /usr/lib/libm387.so.0
            -lm.0 => /usr/lib/libm.so.0
            -lc.12 => /usr/lib/libc.so.12


Let me know if there is something I can do to help.


> Have I mentioned that I'm not a fan of the clean target in
> the NetBSD bsd.prog.mk because it deletes .gdbinit?
>

Oh, boy.  They really mean *squeaky clean*.

Thanks for your time,
Mike
_______________________________________________
DCC mailing list      DCC%rhyolite.com@localhost
http://www.rhyolite.com/mailman/listinfo/dcc

----- End forwarded message -----
----- Forwarded message from Vernon Schryver 
<vjs%calcite.rhyolite.com@localhost> -----

Date: Sat, 6 Jun 2009 00:17:27 GMT
From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
To: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost

> From: MrC <lists-dcc%cappella.us@localhost>

> Here's the abort() in the libpthread version of res_state:

>              static const char res[] = "_res is not supported for 
> multi-threaded"
>                  " programs.\n";
>              (void)write(STDERR_FILENO, res, sizeof(res) - 1);
>              abort();
>              return NULL;
>      }

That is utterly lame, overprotective mommy-ism.  Besides, it stupidly
assumes that stderr has not been closed, as competent programmer does
in a daemon.

A competent programmer knows to suspect that a library is not thread
safe unless it explicitly says it is.  More than that, you assume that
a library that keeps internal state like _res.options and res_init()
is not thread safe unless it both claims to be thread safe and you can't
find any evidence about problems.

And to omit any hint of the nonsense in the resolver man page!
There is this passing mention in resolve.h:
 * Source and Binary compatibility; _res will not work properly
 * with multi-threaded programs.


> And dccifd is linked against libpthread:

Because dccifd and dccm use threads


> Let me know if there is something I can do to help.

Any suggestions on the least nasty kludge to link dccifd and dccm
to the libc resolver intead of the broken-by-design libpthread
resolver?  I'd have to force not only the res_state hooks, but the
whole resolver edifice including anything called inside gethostby*().

Should I change the Makefiles to treat NetBSD like Windows and not build
dccifd and dccm under the toy-applications-for-toy-operating-systems rule?

Maybe I can arrange to not tweak the resolver timeouts for the threaded
DCC programs to limit the total DCC delays and so keep SpamAssassin and
MTAs from being unhappy.


Have I mentioned I'm becoming ever less enamored of recent versions of
NetBSD?  The Linux experts only gratuitously, incompatibly changed the
names of the resolver hooks.


thanks any way,
Vernon Schryver    vjs%rhyolite.com@localhost
_______________________________________________
DCC mailing list      DCC%rhyolite.com@localhost
http://www.rhyolite.com/mailman/listinfo/dcc

----- End forwarded message -----
----- Forwarded message from Paul Vixie <vixie%isc.org@localhost> -----

Date: Sat, 06 Jun 2009 00:49:55 +0000
From: Paul Vixie <vixie%isc.org@localhost>
To: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
Cc: lists-dcc%cappella.us@localhost, dcc%rhyolite.com@localhost

the strange part of all this is, there has been a thread safe superset
of libresolv for about 11 years.  just call res_ninit instead of
res_init, and res_nsend instead of res_send.  netbsd doesn't know this?
_______________________________________________
DCC mailing list      DCC%rhyolite.com@localhost
http://www.rhyolite.com/mailman/listinfo/dcc

----- End forwarded message -----
----- Forwarded message from Vernon Schryver 
<vjs%calcite.rhyolite.com@localhost> -----

Date: Sat, 6 Jun 2009 01:02:56 GMT
From: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
To: dcc%calcite.rhyolite.com@localhost

> From: Paul Vixie <vixie%isc.org@localhost>
> To: Vernon Schryver <vjs%calcite.rhyolite.com@localhost>
> cc: dcc%rhyolite.com@localhost, lists-dcc%cappella.us@localhost

> the strange part of all this is, there has been a thread safe superset
> of libresolv for about 11 years.  just call res_ninit instead of
> res_init, and res_nsend instead of res_send.  netbsd doesn't know this?

I didn't know that, but now that you mention it, it's in non-NetBSD
versions of res_state.c, and so probably in NetBSD versions.
I'd not trust it without reading the whole resolver library because
"thread" does appear in res_state.c.  But there are plenty of
pthread.h and similar stains in other files in the
FreeBSD version of /usr/src/lib/libc/resolv/*


While I'm ranting about NetBSD, I wish they'd get with the program in
minor ways that wouldn't break anything.  I can't think of an excuse
for the spews of compiler warnings like these:
    sign.c:77: warning: pointer targets in passing argument 2 of 'MD5Update' 
differ in signedness
Declaring your MD5Update() to take a const void * instead of a const
unsigned char * should be done even before you tune it.  It should be
done when you add "const" to the ancient code from the RFC 1321.


Vernon Schryver    vjs%rhyolite.com@localhost
_______________________________________________
DCC mailing list      DCC%rhyolite.com@localhost
http://www.rhyolite.com/mailman/listinfo/dcc

----- End forwarded message -----


Home | Main Index | Thread Index | Old Index