Subject: Are res_send and res_query thread-safe?
To: None <tech-userlevel@netbsd.org>
From: Emmanuel Dreyfus <manu@netbsd.org>
List: tech-userlevel
Date: 04/03/2004 15:30:47
Hi

In a multi-threaded program that does DNS lookups, I regularly get stuck
in res_query/res_send code path:

#0  0x4193f238 in recvfrom () from /usr/lib/libc.so.12
#1  0x418a43c0 in __pth_sc_recvfrom () from
/usr/pkg/lib/libpthread.so.20
#2  0x418a2cfc in pth_recvfrom_ev () from /usr/pkg/lib/libpthread.so.20
#3  0x418a2a7c in pth_recv_ev () from /usr/pkg/lib/libpthread.so.20
#4  0x418a2a50 in pth_recv () from /usr/pkg/lib/libpthread.so.20
#5  0x418a4444 in recv () from /usr/pkg/lib/libpthread.so.20
#6  0x418a10fc in pth_poll_ev () from /usr/pkg/lib/libpthread.so.20
#7  0x418a0d44 in pth_poll () from /usr/pkg/lib/libpthread.so.20
#8  0x418a3d24 in poll () from /usr/pkg/lib/libpthread.so.20
#9  0x418799fc in res_send () from /usr/lib/libresolv.so.1
#10 0x41877ef4 in res_query () from /usr/lib/libresolv.so.1

Once there I need to kill -9 the beast, else it won't move.

I suspect a re-entrency problem. Are our resolver functions known to be
thread safe? Searching the web shows there have been some re-entrency
problems in res_send a long time ago:
http://sources.redhat.com/ml/libc-alpha/1999-07/msg00080.html

Is that fixed in our tree? (BTW, what version of BIND is our resolver on
the 1.6 branch?)

If re-entrency issues are out of question, anyone has another idea? Of
course the program works perfectly if it does not do any DNS lookup.

-- 
Emmanuel Dreyfus
Il y a 10 sortes de personnes dans le monde: ceux qui comprennent 
le binaire et ceux qui ne le comprennent pas.
manu@netbsd.org