Subject: Re: RelCache (aka ELF prebinding) news
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Bang Jun-Young <junyoung@netbsd.org>
List: tech-userlevel
Date: 12/04/2002 17:22:55
On Tue, Dec 03, 2002 at 11:02:41PM -0500, Thor Lancelot Simon wrote:
> On Wed, Dec 04, 2002 at 12:30:46PM +0900, Bang Jun-Young wrote:
> > > 
> > > Even a perfect 32-bit identifier isn't good enough, by itself.  I strongly
> > > suspect that a 32-bit identifier stamped into the file, plus information
> > > from the metadata, probably is, in the real world.  The 64-bit identifier
> > > that Bang is using now, consisting of the CRC-32 of the file followed by
> > > the Adler-32 of the file, is probably good enough all by itself but it
> > > seems silly to not use the file size and metadata to further reduce the
> > > chance of collisions no matter how the identifier is chosen.
> > 
> > Last night I must be too sleepy (and you and Jason were right ;-) Okay, I
> > will use the file size as well.
> > 
> > So I will use the following values for identification:
> > 
> >  - 32 bit CRC32
> >  - 32 bit Adler32
> >  - file size
> >  - base address (determined by ld.elf_so for each process)
> 
> So, just to summarize:
> 
> 1) Only root can write to the relocation cache area.  This eliminates
>    the security concerns raised by Mouse.
> 
> 2) You prefer not to include certain filesystem metadata that would
>    invalidate cache entries if a library were moved or renamed.
> 
> Thus yielding the current approach.
> 
> The only thing left that I don't quite grasp is why you aren't using
> the ELF object name, in addition to the base address and file size.
> That would reduce the set of possible collisions enormously.

The file name is a string. Different strings have different lengths, so
it's more difficult and time taking to handle than fixed length variable.
It's no surprise that even the ELF itself makes use of hash to maintain 
symbols.

> 
> I'll point out once more two possibly minor things:
> 
> 1) As Jonathan has pointed out, a Fletcher sum is probably better than
>    an Adler sum for this purpose.  If you want an implementation, I'm
>    sure he or I can send you one (or you can write one yourself in a
>    couple of minutes, the Fletcher checksum is *really* simple).

Okay, I'll have a look at Fletcher checksum too.

> 
> 2) If instead of using hashes at all, you used the dev, ino, gen
>    triple for the library, plus the ctime and mtime, you'd have to
>    re-prebind after restoring, but you could at least be sure that
>    if the _kernel_ thought it was the same file, so would you; and
>    as a few people have pointed out, nobody actually moves shared
>    libraries around with any kind of frequency, and other Unix
>    prebinding systems all require re-prebinding if you do, so maybe
>    that's not the worst approach in the world.

By not using dev, ino, mtime, ctime, etc., you can ship prebound
binaries in the base OS distribution. That would not be possible with
other implementations. I think it _is_ a win.

> 
> Anyway, that's really all I have to say about the subject.  Thank
> you _very_ much for spending so much time listening to suggestions,
> and for doing the work in the first place.

You're welcome. ;-)

Jun-Young

-- 
Bang Jun-Young <junyoung@netbsd.org>