Subject: Re: how to deal with libc vs. libresolve discrepancies?
To: NetBSD User-Level Technical Discussion List <tech-userlevel@NetBSD.org>
From: Greg A. Woods <woods@weird.com>
List: tech-pkg
Date: 12/20/2006 13:38:18
--pgp-sign-Multipart_Wed_Dec_20_13:38:12_2006-1
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

At Tue, 19 Dec 2006 14:32:52 +0100,
Joerg Sonnenberger wrote:
>=20
> On Mon, Dec 18, 2006 at 01:01:58PM -0500, Greg A. Woods wrote:
> > configure:3906: checking for pcap_lookupdev in -lpcap
> > configure:3939: cc -o conftest -O2 -mno-soft-float -mcpu=3D21164a -g -m=
ieee -pipe -mieee -I/usr/include -I/usr/include -static -L/usr/lib -Wl,-R/u=
sr/pkg/lib conftest.c -lpcap  -lm -lresolv  >&5
> > /usr/lib/libc.a(res_query.o): In function `res_querydomain':
> > /building/work/woods/m-NetBSD-1.6/lib/libc/net/res_query.c:331: multipl=
e definition of `res_querydomain'
> > /usr/lib/libresolv.a(res_query.o):/building/work/woods/m-NetBSD-1.6/lib=
/libresolv/../libc/net/res_query.c:331: first defined here
> > ld: Warning: size of symbol `res_querydomain' changed from 448 to 496 i=
n res_query.o
> > /usr/lib/libc.a(res_mkquery.o): In function `__res_opt':
> > /building/work/woods/m-NetBSD-1.6/lib/libc/net/res_mkquery.c:208: multi=
ple definition of `__res_opt'
> > /usr/lib/libresolv.a(res_mkquery.o):/building/work/woods/m-NetBSD-1.6/l=
ib/libresolv/../libc/net/res_mkquery.c:208: first defined here
>=20
> Just guessing. libc is requesting a symbol from res_query.o, which is
> not also provided by libresolv.a, therefore pulling in both objects.

That should not be possible given that they're both compiled from the
exact same source and they both have exactly the same modularization.
Of course that's not exactly what happens under the hood, but still the
major public objects are all theoretically still identical.

Even worse I've been unable in all my examination of the relevant symbol
tables to find anything which could possibly be provided by the
libresolv.a variant of res_query.o vs. the libc variant.  In fact to my
eyes the opposite is true.  Hmmm....  unless something else unrelated in
libc is forcing use of one of the renamed symbols when it probably
shouldn't.

Now that I look at this again though perhaps the problem is due to the
fact that libc has never been properly versioned and instead we keep
relying on these weak symbols to maintain some semblance of a backwards
compatable ABI for the sake of the over-use of dynamic linking.  Perhaps
in this twisted reality libresolv should have maintained the same libc
ABI bass-ackwards compatability too and then these (and similar) object
modules wouldn't be incompatible variants of each other.

$ cd /usr/src/lib/libc
$ cd / $MY_OBJDIR/
/build/woods/whats/NetBSD-1.6.x-alpha-alpha-21164a-obj/usr/src/lib/libc
$ nm res_query.o
nm res_query.o
0000000000000000 a *ABS*
                 U __errno
                 U __res_opt
                 U __res_send
                 U _res
                 U _res_init
                 U _res_mkquery
0000000000000000 T _res_query
00000000000002e0 T _res_search
0000000000000004 C h_errno
                 U ntohs
                 U printf
0000000000000000 W res_query
0000000000000640 T res_querydomain
00000000000002e0 W res_search
                 U sprintf
                 U strlen
                 U strncpy
$ cd ../libresolv
$ nm res_query.o
nm res_query.o
0000000000000000 a *ABS*
                 U __errno
                 U __res_opt
                 U __res_send
                 U _res
0000000000000004 C h_errno
                 U ntohs
                 U res_init
                 U res_mkquery
0000000000000000 T res_query
00000000000005a0 T res_querydomain
0000000000000240 T res_search
                 U sprintf
                 U strlen
                 U strncpy

I still can't find anything else in libc.a though which uses the private
_res_query or _res_search symbols which wouldn't already be satisfied by
an object module already pulled in from libresolv.a.  Perhaps I'm not
looking hard enough, or perhaps in some cases like this config test the
application simply isn't using the full API and thus isn't actually
pulling in enough of libresolv.a first before libc comes along.


> That's a known problem and limitation of static linking.

Well, no, not really.  Blaming it on static linking is like, well, I
can't think of an appropriate analogy for such an inappropriate claim.

It's even more of a problem for dynamic linking sometimes because then
you might just get a crash or corruption or other mis-behaviour instead
of finding out at link time that you're trying to merge incompatible
object code.  At least with a static link you sometimes know sooner that
you've bodged things up (especially when the linker is courteous enough
to show the obvious problems, e.g. different-sized objects), and at
least you always get a firmly predictable and _static_ mapping of
symbols to addresses.

Assuming dynamic linking can actually fix something like this in a safe
and secure manner is incredibly naive.  All dynamic linking can do is
hide these kinds of problems until run time.  If any *BSD were popular
enough that thousands of software vendors (including the OS vendor) each
shipped patched variants of libc.so, just as happens in the windoze
world, then *BSD users would scream just as loudly and pull just as much
hair as all those windoze users who are frustrated by DLL-hell.  Say for
example the proprietary ethereal and mozilla binary package bundles each
came from separate vendors with separate builds containing incompatable
variants of, say, libpango, but their installs hid the clash, then I'd
be just as screwed as a windoze user having the same problems.  My
applications would begin to crash in strange ways and none of my vendors
would accept responsibility and any ordinary user would be helpless to
discover the problem on their own.  Look at the necessary effort that
has been expended in pkgsrc (and every other open source OS binary
packaging system) to avoid the same kind of shared library madness.  We
are lucky, but it is NOT by design -- it comes from the brute force
effort expended by many people co-operating to create a unified build
architecture which is extremely carefully monitored to avoid and
eliminate clashes between shared library variants.  Try that in the
commercial software world!

--=20
						Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>       Secrets of the Weird <woods@weird.com>

--pgp-sign-Multipart_Wed_Dec_20_13:38:12_2006-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: gw4Zp2wjO31rRWq4zp7qnossen6tuoeH

iQA/AwUBRYmDGGJ7XxTCWceFEQIoNQCeP/u+DSDzA9bu9BV3MPOnRFV7AfQAoNgB
vPDiRrnrl6MIIK+z2amQQsND
=NW8k
-----END PGP SIGNATURE-----

--pgp-sign-Multipart_Wed_Dec_20_13:38:12_2006-1--