Subject: RFC: migration to a fully dynamically linked system
To: None <tech-userlevel@netbsd.org>
From: Luke Mewburn <lukem@wasabisystems.com>
List: tech-userlevel
Date: 12/21/2001 12:19:58
A long-standing problem in NetBSD is the inability to call dlopen()
from statically linked binaries (or even attempt to link in dlopen()
with -static in some cases). 

This prevents the use of dlopen() from within libc and other
libraries, which makes in near impossible to solve problems
that are easily solved with dynamically linked modules, such as:
	- nsswitch (e.g, non-BSD licensed LDAP implementations)
	- locale
	- PAM
(amongst others)

Most of the programs in /bin and /sbin use nsswitch in some form or
another. For getpw*() (which you might want to back-end with LDAP,
for example), this includes:
	csh ksh ls mv pax ps rcmd rcp rm sh mount* r?dump* fsck* init
	halt *restore 

I expect that the locale stuff would eventually be used by most programs
in /bin and /sbin.

There's other uses for dlopen() within system libraries as well.


I see 5 possible approaches to solving this problem:


  1. Do nothing for dlopen(); provide other workarounds
	
	Make no change, and find other solutions for the above
	problems. (E.g, dynamically linked nsswitchd accessed
	via authenticated local domain socket, etc)

	Pros:
		- no change (I'm sure this is a positive for some)

	Cons:
		- doesn't solve the underlying problem


  2. Fix dlopen() for static binaries

	Fix the underlying issue, so that dlopen() will work for
	static binaries.  This may not be easily achievable with
	ELF binaries without some kernel help.

	There's issues such as dlopen()ing a library which depends
	upon parts of a dynamic libc (e.g, stdio) which have global
	state, and this will cause problems interacting with the static
	parts of the program which are using the compiled in libc with
	its own version of that global state.

	I've had long discussions with other developers on solving
	this particular issue, and the strong impression I get is
	that this is not a trivial problem to solve; it would require
	a significant amount of effort if it was at all possible.

	Pros:
		- gives a working dlopen()

	Cons:
		- a significant amount of work, and possibly not achievable


  3. Dynamically link everything against /lib

	Provide /lib, which contains /lib/ld.elf_so and all of the
	.so* files from /usr/lib.  /usr/lib still contains the .a,
	_p.a, and _pic.a libraries.  Convert /usr/libexec/ld.elf_so
	to a symlink to /lib/ld.elf_so.

	Dynamically link all of /bin and /sbin and the other bits of
	the tree which are currently statically linked, against /lib.

	Provide an optional recovery mechanism such as /stand/{,s}bin
	which contains statically linked versions of some (or all) of
	the programs in /{s,}bin. Other recovery options include a
	recovery file system which can be loopback mounted (a la vnd),
	a recovery kernel, adding a statically linked netcat and shell,
	etc.

	This is my preferred solution.

	This appears similar to what other systems provide:
		- Redhat 6.2 (and other Linux systems?) has /lib
		  populated with the .so files that are used by /bin
		  and /sbin (and /usr/lib contains .so files which
		  are installed in /usr/pkg/lib on NetBSD.)
		- Solaris 2.6 has /bin -> /usr/bin, /lib -> /usr/lib.
		  /sbin contains some static programs, some dynamic
		  programs linked against /etc/lib, and some dynamic
		  programs linked against /usr/lib. 
		  (I.e, almost useless if /usr isn't available; not
		  the best example)

	Pros:
		- gives a working dlopen()

		- easy to do; I've actually done it and got the
		  Makefile framework modified to do this, except for
		  the recovery stuff - /stand is trivially achieved
		  with a reachover makefile framework.

		- cleanest; shared libs are still only in one location

	Cons:
		- adds about 3 MB to / on an i386; the savings gained
		  by dynamic linking /bin and /sbin offset by the cost
		  of moving /usr/lib/*.so*

		- many people have concerns about this based on prior
		  experience (or folklore) about trashing ld.elf_so
		  and resulting in an unusable system. This is what
		  providing recovery tools helps.

		- dynamic linked applications apparently run slower
		  than statically linked applications.
		  Adding support for "prebinding" (in the newer
		  binutils toolchain for some platforms) alleviates
		  most (if not all) of this.


  4. Dynamically link /bin and /sbin against /lib, rest against /usr/lib

	Similar to 3. above, except that only the .so libraries that
	are needed by /bin and /sbin programs are installed into /lib,
	and the rest of *.so remains in /usr/lib

	Pros:
		- gives a working dlopen()

		- fairly easy to do (a bit more work than 3.)

		- saves about 12 MB in / on an i386

	Cons:
		- "system" shared libraries are spread between
		  /lib and /usr/lib; possible user confusion

		- many people have concerns about this based on prior
		  experience (or folklore) about trashing ld.elf_so
		  and resulting in an unusable system. This is what
		  providing recovery tools helps.

		- dynamic linked applications apparently run slower
		  than statically linked applications.
		  Adding support for "prebinding" (in the newer
		  binutils toolchain for some platforms) alleviates
		  most (if not all) of this.


  5. Install dynamic versions of /{,s}bin programs into /usr/{,s}bin

	Install dynamic versions of programs from /bin into /usr/bin,
	and from /sbin into /usr/sbin, leaving the existing /bin and
	/sbin alone.

	Change default paths to have /usr/bin before /bin and
	/usr/sbin before /sbin.

	/{,s}bin contains the `minimalist' versions of applications,
	and /usr/{,s}bin contains the versions that support the full
	range of dynamic nsswitch (et al) functionality.

	This effectively means that /bin and /sbin are the "boot" and
	"recovery" directories, and /usr/bin and /usr/sbin are the
	"runtime" directories.

	Pros:
		- little change from existing / setup

	Cons:
		- lots of user confusion

		- duplication of applications in tree

		- require changing user paths


Comments?

(let the flood gates open)

-- 
Luke Mewburn  <lukem@wasabisystems.com>  http://www.wasabisystems.com
Luke Mewburn     <lukem@netbsd.org>      http://www.netbsd.org
Wasabi Systems - NetBSD hackers for hire
NetBSD - the world's most portable UNIX-like operating system