On Tue, Dec 03, 2002 at 02:27:02AM -0500, Thor Lancelot Simon wrote: > On Tue, Dec 03, 2002 at 11:29:19AM +0900, Bang Jun-Young wrote: > > On Mon, Dec 02, 2002 at 06:50:58AM -0500, Ty Sarna wrote: > > > > > > For example, I just compiled bin/cat from the 1.6 sources. I download > > > bin/cat from ftp.netbsd.org and it is identical to mine. There are a > > > zillion small ways in which this is useful. Don't screw it up. > > > > (You do have a point ;-) > > > > I agree. Timestamp in ELF binary doesn't seem to be a good idea, either. > > > > Now I'm wondering: is there any 64 bit long checksum available? Or is > > it safe to two different 32 bit checksum methods for a single file? > > I'm curious how you decided that the likelihood of a random collision was > too great with a 32-bit sum. for (;;) { "I'm going to use CRC32." "No, it's too weak. Use MD5." "Okay, I will use MD5." "It's too slow. Use random numbers." "Random number is not good. CRC32 is just enough." } > I started to make some rather simplistic > mathematical arguments here but after investigating the literature on > CRC error rates for about an hour, all I'm really sure of is that my > calculations were wrong. > > However, there are significantly better 32-bit CRC polynomials than the > usual one; the one used by iSCSI is particularly good, against the kind > of errors seen in data communications links at least. Whether it is > better for this purpose, I couldn't really say, because I cannot answer > even the weak question "given two random bitstrings of length N, what is > the probability that they will have the same 32-bit CRC result with > polynomial M", much less "given two ELF object files of length N, what is > the probability that they will have the same 32-bit CRC result with > polynomial M", which is what we care about (since the relcache can > obviously use the file size to eliminate all other collisions). > > Note that the modification time of the file can also be used to further > reduce the number of random collisions, quite possibly close enough to > zero that nobody will ever see one -- but I can't _prove_ this, or even > reason about it with any real basis except for a Very Strong Hunch. > Anyone feeling mathematical? > > I really think that a 32-bit sum suffices unless you really believe that > the inputs are likely to be such that CRC-32 will produce poor results. But > if you really want more bits, you could use both CRC-32 and another 32-bit > (or shorter) sum such as an Adler-32 or Fletcher-32 sum, or just use a > longer sum that is faster to compute than CRC, such as a 40, 48, or 64-bit > Fletcher sum. > > Note that the Fletcher-32 sum should be significantly faster to compute > than the Adler-32 sum, and should serve just fine for this purpose, so if > you decide to use CRC32 and something else, it would be a good choice. > > For some good references on checksums and CRCs, see: > > http://www.haifa.il.ibm.com/satran/ips/draft-sheinwald-iscsi-crc-01.txt > http://www.diskdrive.com/reading-room/white-papers/Definitive_CRC32C_Paper.pdf > > http://www.ece.cmu.edu/~ips/archive/draft-cavanna-iscsi-crc-vs-cksum-01.txt > > Note that the last of those three references contains some mistakes in its > reasoning about the probability of various kinds of checksum failures, but > is still a good general overview of the different kinds of sums available, > and is less mathematically daunting than the first two. > > Also note that the commonly-used "CRC-64", originally from Matt Dillon's > "diabolo" code, I believe, uses a rather poorly chosen polynomial (to begin > with, it's really only 56 bits -- one end is all zeroes!) and is not a great > choice. Thanks for your comments. Conclusion: I will be back to using CRC32. Jun-Young -- Bang Jun-Young <junyoung@netbsd.org>