tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [Fwd: Re: dccifd: restart after signal 6]



On Jun 8,  2:25pm, vjs%calcite.rhyolite.com@localhost (Vernon Schryver) wrote:
-- Subject: Re: [Fwd: Re: dccifd: restart after signal 6]

| > To: tech-net%netbsd.org@localhost
| > From:  christos%astron.com@localhost (Christos Zoulas)
| 
| > How many years will it take people to switch from res_foo to res_nfoo
| > which is re-entrant? Really, these functions have been there for
| > more than a decade. There is no excuse for a multi-threaded program
| > not to use them if they are available and instead use their own
| > API and do their own locking. What is next? Don't use the _r functions
| > and do your own locking? Add mutexes to protect a possibly non-thread-safe
| > malloc? Unless of course I am missing something, and if so I apologize.
| > 99% of the programs that use _res in a multi-threaded environment do
| > so incorrectly, and it is a good thing to make them use the new API's.
| > Someone made a mistake exposing _res a *long* time ago. Let's not
| > perpetuate it by having new code try to use it.
| 
| It is crazy to expect people to use undocument facilities.
| "res_init", "res_send", and so forth appear in `man 3 resolver` on
| "NetBSD 4.0.1 (GENERIC) " but the strings "nres" and "thread" do
| not appear even once.

Yes, this is NetBSD's fault. The documentation is there, it just has
not been updated. I will fix it right now.

| Then there is the issue of how the resolver state structure set by
| res_nfoo is supposed to be communicated to the resolver inside
| gethostbyname().  Putting values into a per-thread, caller-maintained
| structure does nothing unless the library code knows to use the per-thread
| structure instead of the common global structure.  Have you bothered
| to check that changing values with res_nfoo has any effect?  Maybe set
| the retry limits 1 retransmission and 1 second and then seeing how soon
| a failure happens when you try to resolve a domain whose authoritative
| server does not answer at all?  (I use timeo.rhyolite.com for such tests)

This is another obsolete interface and non-thread-safe interface that has
been replaced by gethostinfo(). 

| It is incompetent to convert _res into a run-time crash when correct or
| less bad things to do are so easy and obvious.  You could easily make
| threaded programs not link if they use _res.  Another tactic would be
| to change the threaded version of the resolver library to use per-thread
| _res structure, and to make the res_foo() functions in the threaded
| resolver library call res_nfoo().

That is harder to do, since the threaded program links in with libc
which needs the _res symbol defined.

| That you call abort() after writing to stderr in is emblematic of
| the NetBSD problems.  It would be dicey to call syslog(), but simply
| assuming that stderr has not been long since closed is at best far
| too naive for anyone allowed to touch a libc source tree.

Well, I agree with you, but this is not the only function that prints
errors to stderr in libc; and if stderr is closed, you could easily
look at the backtrace in the debugger to find out what went wrong.

| As for malloc(), I don't know what other people do, but I use it only
| with sufficient locking.  It is incompetent to just assume that any
| function that must obviously keep internal state is thread-safe without
| explicit words in documentation.  (Never mind that I avoid malloc()
| in general.)  

Most such functions have been replaced with thread-safe variants. Insisting
on calling the old API's from threaded contexts makes little sense. It
will cause unexpected behavior, and poor performance.

| Besides, the crazy NetBSD philosophy would be to malloc() for
| threaded applications into {printf(); abort();} and expect people
| to know by mental telepathy to use an undocument function that in
| fact does not work.

This is a different case. It is a fine line we've drawn but never-the-less
it is a line. Using an API that we've been trying to make obsolete for
more than 10 years that exposes the guts of libc is one thing, using an
API that is still valid is something different. I challenge you to find
another OS where a binary created on that OS more 10 years ago will work
unmodified in the current version of that OS and uses _res. Yes, we have
kept binary compatibility of _res since that long ago. But we have not
allowed _res to work in multi-threaded programs since we imported the
new multi-threaded aware resolver api's. Using _res from a multi-threaded
program has never worked on NetBSD.
_res directly 

| It is stupid egotism to change well documented names simply because
| you can.  For decades starting long before NetBSD, existed people
| including myself have been doing all kinds of stuff under the covers
| in system libraries including libc to preserve APIs.  You only change
| names as with localtime()/localtime_r() when the old API cannot be
| maintained or when you want to offer an improved API. 
| If for some reason you can't offer a new name, you at least document
| the problem.  If you don't do that, you at least delete the old
| documentation!

Yes, I will fix the documentation.
 
| Of course on abstract grounds, nres_foo() is better than res_foo() just
| as localtime_r() is better than localtime().  But only people who
| shouldn't be allowed near a source tree would willfully and knowingly
| break res_foo() without either documenting res_nfoo() or implementing
| res_foo() as something like res_nfoo(pthread_getspecific())

Ask the libbind maintainers to do this.

| Finally DO NOT WRITE ME about this stuff.  I care only about the
| code.  I'm not interested in joining a djb style,
| we-re-so-wonderful-because-we-tell-each-other-so cult.
| The nature of NetBSD in this decade is clear regardless of this
| _res silliness.

I did not write you. I replied to a question on why _res aborts in
multi-threaded programs in the mailing list.

christos


Home | Main Index | Thread Index | Old Index