Subject: Re: bin/35589 (Dynamic linking is very slow)
To: None <skrll@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,>
From: David Laight <david@l8s.co.uk>
List: netbsd-bugs
Date: 02/13/2007 18:30:02
The following reply was made to PR bin/35589; it has been noted by GNATS.

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/35589 (Dynamic linking is very slow)
Date: Tue, 13 Feb 2007 18:26:45 +0000

 On Tue, Feb 13, 2007 at 05:21:07PM +0000, skrll@netbsd.org wrote:
 
 Some additional data from some experiments I did a couple of years ago.
 
 The elf symbol table is hashed modulo the first prime less than power
 of two below the number of symbols.
 This means that, on average, the hash chains are 1.5 entries long.
 Almost every symbol has to be checked against every library (because they
 are either 'weak' or defined in the program body), so you get 1.5 strcmp()
 calls per library per symbol lookup.
 
 OTOH were the module 2^n-1 being somewhat greater than the number of symbols:
 1) The modulo operation could be a shift+mask
 2) The hash chains would often be empty
 Both would speed things up.
 Changing the modulo had little effect on the maximum chain length for libc.
 
 Additionally the length of the symbol name is known [1], so the strcmp() can
 be replaced by the faster memcmp().  However since the comparisons usually
 fail on the first few bytes, you don't want the inlined memcmp() code!
 
 Another improvement would to get the elf string table to have all the
 strings 4 byte aligned.
 
 For large C++ programs (I played with mozilla) it is actually worth
 merging all the symbol tables for the libraries loaded at program load time
 into a single large table - this only required a single hashed lookup for
 each symbol.
 
 I also suspect that the 'lazy' fixup of code symbols causes disk thrashing
 later on as the required pages of all the elf headers are faulted in
 non-sequentially.  Whereas if the entire elf header of each library was
 sequentially read in, then all the PLT fixups done, the time waiting for
 the disk would decrease considerably.
 
 It is also possible to knock a considerable number of cycles out of the
 symbol name hash function.
 
 	David
 
 [1] Unfortunately it can't be (or at least isn't) saved in the elf symbol table.
 
 -- 
 David Laight: david@l8s.co.uk