tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Small ld.elf_so speed up



On Thu, 1 Apr 2010 19:01:26 +0200
Joerg Sonnenberger <joerg%britannica.bec.de@localhost> wrote:

> On Thu, Apr 01, 2010 at 05:45:51PM +0100, Sad Clouds wrote:
> > So, if you have an application that is linked to a total of 10
> > shared libraries. Each of those libraries exports 50 symbols. The
> > application references all of those symbols, that is 10 * 50 = 500
> > symbols. This then increases load time.
> 
> No. It doesn't matter how many symbols a library *exports*. The
> question is, how many relocations have to be resolved. That is, how
> many undefined symbols are present. There is one exception here in
> that global symbols in the same DSO may take some short cuts if they
> are not exported, but that is not relevant for runtime linker
> overhead.
> 
> [snip]
> > I don't know how dynamic linker is implemented, but I've been
> > developing some of my packages/libraries as described above. I also
> > added calls to pthread_mutex_lock() to make package init() and
> > destroy() functions thread-safe.
> 
> You have essentially reimplemented what the dynamic linker does. Just
> in a more expensive way. It is more expensive in terms of per-call
> overhead as indirect calls can on most CPUs be considered as
> mispredicted branch. It has larger startup overhead, because the
> relocations can't be done lazy.
> 
> Joerg

Joerg I did a few tests and they seem to indicate that declaring
functions 'static' and then exporting them via function pointers is not
more expensive, but quite the opposite.

I built two versions of shared library and main program.

First version is the normal way of letting the linker resolve all
undefined symbols:

./libtest.so.0
        This shared library has 10000 simple functions of the form:

        int fn_0(int n) { return n++; }
        ...
        int fn_9999(int n) { return n++; }

./test_main
        This main program is linked to the above library and has 10000
        functions calls of the form:

        fn_0(1);
        ...
        fn_9999(1);


Second version is declaring all symbols 'static' and exporting them via
function pointers. The way I described in my previous email:

./libtest2.so.0
        This shared library has 10000 simple functions of the form:
        
        static int priv_fn_0(int n) { return n++; }
        ...
        static int priv_fn_9999(int n) { return n++; }

        Which are then exported with the following calls at run time:

        (*pkg)->fn_0 = &priv_fn_0;
        ...
        (*pkg)->fn_9999 = &priv_fn_9999;

./test2_main
        This main program is linked to the above library and has 10000
        functions calls of the form:

        test2_init(&test2);

        test2->fn_0(1);
        ...
        test2->fn_9999(1);


Below are some statistics for both programs:

p3smp$ ls -lh libtest.so.0 test_main
-rwxr-xr-x  1 rom  wheel  617K Apr  1 21:13 libtest.so.0
-rwxr-xr-x  1 rom  wheel  936K Apr  1 21:17 test_main 

p3smp$ size ./test_main
   text    data     bss     dec     hex filename
 673380   40280      36  713696   ae3e0 ./test_main

p3smp$ size ./libtest.so.0
   text    data     bss     dec     hex filename
 391952     116       0  392068   5fb84 ./libtest.so.0

p3smp$ nm test_main | grep U | wc -l
   10003

p3smp$ time ./test_main
        0.02 real         0.02 user         0.00 sys

----------------------------------------------------------

p3smp$ ls -lh libtest2.so.0 test2_main
-rwxr-xr-x  1 rom  wheel  480K Apr  1 21:10 libtest2.so.0
-rwxr-xr-x  1 rom  wheel  181K Apr  1 21:15 test2_main

p3smp$ size ./test2_main
   text    data     bss     dec     hex filename
 181676     284      40  182000   2c6f0 ./test2_main

p3smp$ size ./libtest2.so.0
   text    data     bss     dec     hex filename
 200452     160       0  200612   30fa4 ./libtest2.so.0

p3smp$ nm test2_main | grep U | wc -l
       4

p3smp$ time ./test2_main
        0.00 real         0.00 user         0.00 sys

As you can see above:

1. libtest2.so is 100K smaller
2. test2_main is 755K (5 times) smaller
3. test2_main has only 4 unresolved symbols, compared to 10003 for other
4. test2_main program load/run time is smaller


Home | Main Index | Thread Index | Old Index