Subject: Re: RelCache (aka ELF prebinding) news
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Bang Jun-Young <junyoung@netbsd.org>
List: tech-kern
Date: 12/03/2002 16:52:18
On Tue, Dec 03, 2002 at 02:27:02AM -0500, Thor Lancelot Simon wrote:
> On Tue, Dec 03, 2002 at 11:29:19AM +0900, Bang Jun-Young wrote:
> > On Mon, Dec 02, 2002 at 06:50:58AM -0500, Ty Sarna wrote:
> > > 
> > > For example, I just compiled bin/cat from the 1.6 sources. I download
> > > bin/cat from ftp.netbsd.org and it is identical to mine. There are a
> > > zillion small ways in which this is useful. Don't screw it up.
> > 
> > (You do have a point ;-)
> > 
> > I agree. Timestamp in ELF binary doesn't seem to be a good idea, either.
> > 
> > Now I'm wondering: is there any 64 bit long checksum available? Or is
> > it safe to two different 32 bit checksum methods for a single file?
> 
> I'm curious how you decided that the likelihood of a random collision was
> too great with a 32-bit sum.

for (;;) {
	"I'm going to use CRC32."

	"No, it's too weak. Use MD5."

	"Okay, I will use MD5."

	"It's too slow. Use random numbers."

	"Random number is not good. CRC32 is just enough."
}

> I started to make some rather simplistic
> mathematical arguments here but after investigating the literature on
> CRC error rates for about an hour, all I'm really sure of is that my
> calculations were wrong.
> 
> However, there are significantly better 32-bit CRC polynomials than the
> usual one; the one used by iSCSI is particularly good, against the kind
> of errors seen in data communications links at least.  Whether it is
> better for this purpose, I couldn't really say, because I cannot answer
> even the weak question "given two random bitstrings of length N, what is
> the probability that they will have the same 32-bit CRC result with
> polynomial M", much less "given two ELF object files of length N, what is
> the probability that they will have the same 32-bit CRC result with
> polynomial M", which is what we care about (since the relcache can
> obviously use the file size to eliminate all other collisions).
> 
> Note that the modification time of the file can also be used to further
> reduce the number of random collisions, quite possibly close enough to
> zero that nobody will ever see one -- but I can't _prove_ this, or even
> reason about it with any real basis except for a Very Strong Hunch.  
> Anyone feeling mathematical?
> 
> I really think that a 32-bit sum suffices unless you really believe that
> the inputs are likely to be such that CRC-32 will produce poor results.  But
> if you really want more bits, you could use both CRC-32 and another 32-bit
> (or shorter) sum such as an Adler-32 or Fletcher-32 sum, or just use a
> longer sum that is faster to compute than CRC, such as a 40, 48, or 64-bit
> Fletcher sum.
> 
> Note that the Fletcher-32 sum should be significantly faster to compute
> than the Adler-32 sum, and should serve just fine for this purpose, so if
> you decide to use CRC32 and something else, it would be a good choice.
> 
> For some good references on checksums and CRCs, see:
> 
> http://www.haifa.il.ibm.com/satran/ips/draft-sheinwald-iscsi-crc-01.txt
> http://www.diskdrive.com/reading-room/white-papers/Definitive_CRC32C_Paper.pdf
> 
> http://www.ece.cmu.edu/~ips/archive/draft-cavanna-iscsi-crc-vs-cksum-01.txt
> 
> Note that the last of those three references contains some mistakes in its
> reasoning about the probability of various kinds of checksum failures, but
> is still a good general overview of the different kinds of sums available,
> and is less mathematically daunting than the first two.
> 
> Also note that the commonly-used "CRC-64", originally from Matt Dillon's
> "diabolo" code, I believe, uses a rather poorly chosen polynomial (to begin
> with, it's really only 56 bits -- one end is all zeroes!) and is not a great
> choice.

Thanks for your comments.

Conclusion: I will be back to using CRC32.

Jun-Young

-- 
Bang Jun-Young <junyoung@netbsd.org>