Subject: Re: cgd and replay
To: Ted Unangst <tedu@zeitbombe.org>
From: Roland Dowdeswell <elric@imrryr.org>
List: tech-security
Date: 05/11/2005 09:57:53
On 1115774659 seconds since the Beginning of the UNIX epoch
Ted Unangst wrote:
>
>it's my understanding that cgd doesn't provide any protection against 
>replay or other injection attacks.  this wasn't really addressed in the 
>paper, except in passing.  was it considered and rejected as outside 
>problem space?  too difficult?  essentially, does anybody care and how 
>much?  if i wanted to authenticate the data on the disk, what's the best 
>approach?
>
>attack scenario is kinda like this.  some kind of network where the users 
>trust their laptops, but possibly not the large usb drive left in the 
>office over night, and want to detect tampering. 

When I wrote CGD I was mostly concerned with the compromise of data
rather than injection of data.  We've discussed integrity and my
preference would be to provide for integrity at a higher level
where the code has some understanding of the structure of the data
that it is trying to protect.  This allows one to provide integrity
in a much more efficient fashion.  In fact, what I'd like to see
at some point is the replacement of simple checksums in LFS with
HMACs (along with HMACing the data with is not already protected).
This would provide the integrity protection with the smallest cost
and overhead.  Of course, this would only work for LFS.

Another thought that I've been tossing around is whether a weaker
condition can be bought without incurring the cost of additional
storage and the transactionality costs that come with it.
Specifically, would it be possible to prevent predictable modification
of the plaintext even if the OS cannot detect attempted modification.
As a simple example, if we use a cipher with a cipherblock size of
512 bytes, then an attacker cannot make a predictable modification
to any block---any change will turn the entire block into [effectively]
random data.  Although you not be able to detect that the block
has been changed in the general case, the attacker would not actually
[in general] be able to achieve predictable results either.  (And
for the most part, files are highly structured and it would be
apparent that something fishy was going on.)

Thor suggested to me recently that one could avoid the transactionality
costs by simply formatting the disk to have 532 byte blocks and
having the pseudo-disk present 512 byte blocks to the upper layers.
This would most likely involve quite a bit of code modifications
to the device driver code to support non-power-of-two block sizes
but would be the most efficient way to do it.  It would also make
it a little difficult to stack pseudo-disks.

If we did want to provide a block level integrity check, then it
probably makes sense to write it as a different pseudo-disk to keep
the complexity in a different place.  This also begs the question,
though, of key generation for the integrity checking.  And general
configuration, which I'd have to put a bit of thought into.  It
would be unfortunate to have to enter two separate pass phrases.

The implementation could be something like:

	1.  define a HMAC entity for each disk block containing:

		i.    two HMACs,
		ii.   the order is important: the first
		      is new the second is old,

	2.  munge HMAC entities into block sizes and intersperse
	    the blocks within disk blocks.

	3.  each write would have to:

		i.    read the HMAC entity,
		ii.   write a new HMAC entity with the new HMAC
		      first and the old new HMAC second,
		iii.  write the disk block.

	4.  verification would allow either of the HMACs to be
	    considered valid.

	5.  after a crash the HMAC entities would need to be
	    fixed:

		i.    read HMAC entity,
		ii.   read corresponding disk block,
		iii.  determine which HMAC is valid, and
		iv.   if the second HMAC is valid then
		      rewrite the HMAC entity with the
		      second HMAC first

	    This step could be done in a number of different ways,
	    and should be able to be part of the normal write path.

The main problem with this is that it turns one disk write into
one read and two writes.  Which is a little suboptimal.  With the
LFS modifications that I proposed above, the number of disk
interactions would remain exactly the same (and I do cling to the
belief that one day LFS will work.  :-)

--
    Roland Dowdeswell                      http://www.Imrryr.ORG/~elric/