Subject: Re: HEADS UP: migration to fully dynamic linked "base" system
To: Luke Mewburn <lukem@wasabisystems.com>
From: David Laight <david@l8s.co.uk>
List: current-users
Date: 09/24/2002 18:05:05
On Wed, Sep 25, 2002 at 01:58:37AM +1000, Luke Mewburn wrote:
> On Tue, Sep 24, 2002 at 02:39:14PM +0100, David Laight wrote:
>   | 1) If I build a program it still has /usr/libexec/ld.elf_so as its
>   |    interpreter  (the system stuff all has /libexec/ld.elf_so).
>   |    Following the extra symlink does take measurable time
> 
> If it really causes a user grief with their old applications, rewrite the
> .interp section in the elf header; there should be room to rewrite
> "/usr/libexec/ld.elf_so" -> "/libexec/ld.elf_so".  I'd rather not have
> two copies of these programs, for something which the namei() cache
> should be caching anyway.

For existing binaries the symlink is (probably) fine.
But the toolchain ought to be adding the correct interpreter
for newly linked programs - and it doesn't seem to.
(Rewriting the .interp section is just and exercise in printf and dd.)

>   | 2) ldd disagrees with ld.elf_so as to which libraries are loaded.
>   |    ldd still looks in /usr/lib first...
>   |    I can't actually see where ld.elf_so gets the default path from :-(
> 
> Is it compiled in?  Check the Makefiles for ldd and ld.elf_so.  Is ldd
> using the rpath?

Thanks - I just couldn't see it in the source :-(
ld.elf_so is built with  -DRTLD_DEFAULT_LIBRARY_PATH=\"/lib:/usr/lib\"
but ldd_elf only has -DLIBDIR=\"/usr/lib\"

>   | 3) The big killer on loading things like netscape is symbol name
>   |    lookup for relocations [1].
>   |    I suspect this will only be solved using an entirely different
>   |    structure for the symbol table.
>   |    A rough guess is that the hash function is the most significant
>   |    bit!  Mayby I'll code an asm version...
Hash isn't THAT significant...
> 
> I believe Bang Jun-Young and Charles Hannum are working on this.

So am I....

Something is definitely wrong with the DAG list, some parts of
mozilla end up with lists that reference the same object 24
times - and we check for the symbol each time - before giving
up, checking the 'global' list, finally using the weak symbol
from the program body (aka libc).
Worst so far it looking for 'lseek' - 3ms, 270 checks of about 30
libraries.
OTOH this isn't the performance killer.

The actual 'problem' is that references to code functions within a
library have to be looked up (rather than being fixed to relative
junps by the linker).  This is an ELF 'feature' that makes it easy
for an application program to completely kill the operation of a
library by 'accidentally' defining a function with the same name
as a library function.

The only work around I know is to build the shared library by
'cat'ing all the .c files together so that the internal functions
can be static, and using wrappers for functions that are called
both internally and externally.

	David

-- 
David Laight: david@l8s.co.uk