tech-install archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: HTTPS trust anchors in sysinst



> Date: Sun, 3 Sep 2023 12:16:47 -0400 (EDT)
> From: Mouse <mouse%Rodents-Montreal.ORG@localhost>
> 
> > Now suppose, after the string x is determined, I pick a 32-byte
> > string r uniformly at random, compute h = MD5(r || x), and send you
> > (r, h).  You download the file, again giving some string x', and you
> > accept it only if MD5(r || x') ?= h.
> 
> > Nobody has shown any way to pass a forgery through this criterion.
> > That's what I mean by randomized MD5.
> 
> I thought MD5 was not only weak for collision resistance but also weak
> for second-preimage resistance.  Is that wrong?

If you have a reference for that I would be excited to see it!

There are some reports in the literature that may be misread as
breaking MD5 preimage security, like this oft-cited one:

   Yu Sasaki and Kazumaro Aoki, `Finding Preimages in MD5 Faster Than
   Exhaustive Search', in Antoine Joux (ed.), Advances in Cryptology -
   EUROCRYPT 2009, Springer LNCS 5479, pp. 134-152.
   https://link.springer.com/chapter/10.1007/978-3-642-01001-9_8

Quoting from the abstract:

   In this paper, we present the first cryptographic preimage attack
   on the full MD5 hash function. This attack, with a complexity of
   2^116.9, generates a pseudo-preimage of MD5 and, with a complexity
   of 2^123.4, generates a preimage of MD5. The memory complexity of
   the attack is 2^45 x 11 words.

That is, they show that, by powering and managing 1408 terabytes of
storage, you can find an MD5 preimage in average time about 1/12 what
a serialized generic search would need (2^127 calls to MD5).

But I have a technique for doing better than that -- say, 1/16 the
time of a serialized generic search, at substantially lower real-world
cost.  How does it work?  Just run 16 generic search circuits in
parallel.

Bonus: my technique has essentially zero extra storage costs, or
communication costs (which the paper sweeps under the rug), and it
scales to as many circuits as you can fabricate and power.

In other words, this paper doesn't actually demonstrate any cost
reduction in MD5 preimage search.  The baseline is not a _serialized_
generic search with cost measured in time; the baseline is a
_parallel_ generic search with cost measured in dollars, for which
area*time is a reasonable proxy.  The paper may have some
theoretically interesting content but it doesn't affect MD5 preimage
security, not even theoretically.

(That's not to say I recommend putting MD5 into new designs.  It's
easy to get protocols wrong so you accidentally depend on collision
resistance when you didn't mean to.  It raises all kinds of auditor
eyebrows, for good reasons.  It doesn't get much scrutiny any more
because the collision resistance is broken.  But it's an option for a
conservative protocol like this, if the performance benefit outweighs
the cost of scraping auditor eyebrows off the ceiling.)

> Also, is that under the assumption that the putative forger gets to see
> r, or doesn't get to see r?  If the forger gets to compute x' knowing r
> and x, I'd be surprised if that is much more difficult than just the
> ordinary second-preimage problem.

The forger gets to choose x first, and then gets to see the uniform
random r once x is fixed.  After that, the forger has a chance to
attempt a forgery x' of their choice.  The forger wins if MD5(r || x')
= MD5(r || x).


Home | Main Index | Thread Index | Old Index