Subject: shared library trouble
To: None <current-users@NetBSD.ORG>
From: der Mouse <mouse@Holo.Rodents.Montreal.QC.CA>
List: current-users
Date: 08/18/1996 21:30:57
Background: I'm trying to install X11R6.1p1 on NetBSD/sun3 on a
Sun-3/260 running a July 3 -current (= 1.2 release branch).

After much struggling, some with NetBSD bugs and some with X bugs (a
full report will follow once I have it working), I got it to the point
where it passed a basic smoke test in the build directory; I could run
	./xinit ../xterm/xterm -- ../programs/Xserver/Xsun :0
and it came up, with a few minor warnings.  So I did a "make install"
and tried to use the installed version, and I ran smack into completely
inscrutiable (to me) behavior by our ld.so.

The problem is that with LD_LIBRARY_PATH set to one value, ldd on an
executable finds all the libraries.  With it set to another value,
pointing to a directory that's "the same", it finds the libraries, then
complains about not being able to find the same libraries.

Specifically:

/local/.lib/X11-R6.1p1 is the directory where the libraries get
installed; /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib is where they
get pseudo-installed during the build, via symlinks into the source
directories.  The contents of these directories differ slightly, but I
think the differences are not what's causing my trouble:

/local/.lib/X11-R6.1p1		/local/src/.../usrlib:
X11
libFS.a				libFS.a
libICE.so.6.0			libICE.so.6.0
libPEX5.so.6.0			libPEX5.so.6.0
libSM.so.6.0			libSM.so.6.0
libX11.so.6.1			libX11.so.6.1
libXIE.so.6.0			libXIE.so.6.0
libXau.a			libXau.a
libXaw.so.6.1			libXaw.so.6.1
libXdmcp.a			libXdmcp.a
libXext.so.6.1			libXext.so.6.1
libXi.so.6.0			libXi.so.6.0
libXmu.so.6.0			libXmu.so.6.0
libXt.so.6.0			libXt.so.6.0
libXtst.so.6.1			libXtst.so.6.1
				libfont.a
liboldX.so.6.0			liboldX.so.6.0
libxkbfile.a			libxkbfile.a

I checked that the files in question are identical, once the symlinks
get followed:

for i in Xmu Xt ICE SM Xext X11
do
	cmp /local/.lib/X11-R6.1p1/lib$i.so.* /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib/lib$i.so.*
done

which produced no output.

In /local/.bin/X11-R6.1p1 (the binaries install directory), with
LD_LIBRARY_PATH set to /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib,
% ldd ./xinit:
	-lXmu.6 => /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib/libXmu.so.6.0 (0x20a0000)
	-lXt.6 => /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib/libXt.so.6.0 (0x20c0000)
	-lICE.6 => /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib/libICE.so.6.0 (0x2100000)
	-lSM.6 => /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib/libSM.so.6.0 (0x2120000)
	-lXext.6 => /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib/libXext.so.6.1 (0x2140000)
	-lX11.6 => /local/src/PENDING.X11R6/X11-R6.1p1/xc/usrlib/libX11.so.6.1 (0x2160000)
	-lc.12 => /usr/lib/libc.so.12.5 (0x21e0000)

Yet with LD_LIBRARY_PATH set to /local/.lib/X11-R6.1p1, and no other
changes,
% ldd ./xinit:
	-lXmu.6 => /local/.lib/X11-R6.1p1/libXmu.so.6.0 (0x20a0000)
	-lXt.6 => /local/.lib/X11-R6.1p1/libXt.so.6.0 (0x20c0000)
	-lICE.6 => /local/.lib/X11-R6.1p1/libICE.so.6.0 (0x2100000)
	-lSM.6 => /local/.lib/X11-R6.1p1/libSM.so.6.0 (0x2120000)
	-lXext.6 => /local/.lib/X11-R6.1p1/libXext.so.6.1 (0x2140000)
	-lX11.6 => /local/.lib/X11-R6.1p1/libX11.so.6.1 (0x2160000)
	-lc.12 => /usr/lib/libc.so.12.5 (0x21e0000)
	-lXt.6 => not found (0x0)
	-lICE.6 => not found (0x0)
	-lSM.6 => not found (0x0)
	-lXext.6 => not found (0x0)
	-lX11.6 => not found (0x0)
	-lICE.6 => not found (0x0)
	-lSM.6 => not found (0x0)
	-lICE.6 => not found (0x0)
	-lX11.6 => not found (0x0)

Now, some of those libraries are linked with others; for example,
libXt.so was linked with -L../../usrlib -lICE -lSM.  But what I cannot
understand is why it works differently between the two cases and why
the libraries are not found in the second case.

I get the same behavior if I'm in the xinit source directory, running
ldd on the executable there.  If it makes any difference, the
executable (xinit) is linked with -R/local/.lib/X11-R6.1p1.

With LD_LIBRARY_PATH set to the /local/src/.../usrlib path, everything
runs enough to bring up a server with an xterm, so there is nothing
fundamentally wrong with the executables.

I had a quick look at /usr/src/gnu/usr.bin/ld/rtld, but there does not
appear to be any way to cause it to give a detailed trace (looking for
-lXmu.6, referenced from executable, trying /local/.lib/..., succeeded,
looking for -lX11 referenced from /local/.lib/.../libXmu.so.6.0, etc),
which is the sort of thing I'd need to get anywhere with this.

Before I dive in and start adding such code, does anyone have any
suggestions about what to do?  In particular, I've heard of people
building X11R6 on NetBSD; has anyone dealt with this problem before?

Before anyone asks why one .so file is linked with -l options referring
to others, that's necessary in order to get things to start up at all.
Before I did that, I got complaints about _SmcSaveYourselfDone being
undefined in libXt.so.6.0; adding -lSM to the link, either of the
program in question or of libXt.so.6.0, cured that; I found I had to
add another thing as well.  (This is one of the NetBSD bugs: the linker
should have complained about _SmcSaveYourselfDebug being undefined
during at least one of the ld runs involved.  I'm going to send-pr that
once I come up for air from getting X working - which I will, even if I
have to tell it to not use .so libraries at all.)  So I did a bunch of
nm runs, piped their output through an awk script, and mechanically
generated a list of what .so files referred to symbols from what other
.so files.  Thanks to some stupidity somewhere in the X setup, I had to
delete a couple of those, but everything links and, except for the
above trouble, runs.

Any help would be much appreciated.

					der Mouse

			    mouse@collatz.mcrcim.mcgill.edu
		    01 EE 31 F6 BB 0C 34 36  00 F3 7C 5A C1 A0 67 1D