Subject: New idea on ELF prebinding
To: None <>
From: Bang Jun-Young <>
List: tech-userlevel
Date: 11/22/2002 15:18:28
Hi folks,

Here's the summary of what I have been thinking about ELF prebinding
since a couple times of objections from people against my previous
not-good implementation ;-):

- Every binary, including executable and shared object, has .csum
  section inserted by ld(1) at compile time. It is 32-bit long and
  used for storing checksum (CRC32) of the binary.

- Actual prebinding and prerelocation is done by ld.elf_so(1). After 
  ld.elf_so(1) loads a binary for the first time, it creates a disk
  file in /usr/libexec/reloc (say it "cache") and writes all of the
  relocated GOT and PLT sections in memory to the file (checksum and
  other necessary information as well). In any subsequent execution
  of the same binary, ld.elf_so(1) no longer performs relocation.
  Instead it loads cache from the disk file previously created and
  compares cache information and in-memory data. If they don't differ,
  it patches GOT/PLT pointers so that they point to locations in the
  cache. But if they differ, ld.elf_so(1) will do the same job. 

- As time goes by, there will be more caches stored in /usr/libexec/reloc.
  If needed, elfreld(1) daemon regularly check if they are still valid, and
  removes invalid files. Or you can remove all of them, and ld.elf_so(1)
  will perform the same job again for each binary it loads.

Advantages of this method include:

- Minimal modification to binary. Only .csum is inserted and it is
  ignored by old ld.elf_so(1).

- No additional executable is required (elfreld(1) is fully optional).

- It doesn't break ELF semantics. Cache is just an image of the in-memory 
  data after relocation is done.

- It is much simpler and can be (significantly) faster than our
  competitor's implementation (prelinking in Red Hat 8.0). You don't
  even have to bother to run prelink against newly created binaries in
  the system regularly. Everything is automagically done by ld.elf_so(1). 

- Better CPU cache utilization is possible, since it is likely
  that all the GOT/PLT entries for a binary and shared objects it
  depends on are stored together in a single page, or at least, adjacent
  in memory. 

Disadvantages of it include:

- When ld.elf_so(1) loads a binary for the first time, it takes more
  time (rarely, much more) to get it done, since it creates and write
  cache to disk.

- Security considerations (?).

- (please put your comments here ;-).

Comments would be appreciated, 


Bang Jun-Young <>