Subject: Compressed filesystem (was Re: CryptoGraphic Disk.)
To: Hubert Feyrer <hubert.feyrer@informatik.fh-regensburg.de>
From: Todd Vierling <tv@pobox.com>
List: tech-security
Date: 10/10/2002 16:35:07
On Sun, 6 Oct 2002, Hubert Feyrer wrote:

: Uhu. I understand writing esp. in the middle of files is interesting.
: But for a start, a read-only filesystem that was "pre-compressed" would be
: enough - what I have in mind is a CD that has more than 650MB.

Warning: compression internals discussion ahead.  Watch for falling bits.

The biggest problem with any on-the-fly mechanism, even for reading, is
seeking *backwards*.  Most compression algorithms are highly directional
(towards the end of the stream).  Without some mechanism for handling
backwards motion, any such seek would require a rewind() and restart of
decompression from the beginning.  That would be really ugly and obnoxiously
slow for any file > 256KB or so.

Now, this could be accomplished with gzip/zlib compression, provided you
have some layer to intercept lseek(), through crafty finagling of zlib
internals (by stashing the decompressor's state machine and Lempel-Ziv
window periodically as "smart restore points").  That's memory intensive
depending on the compression level (-z9 is a 32KB LZ window, I believe), and
how your backoff algorithm chooses when to throw away saved state.

The other alternative, which has been used in some compression schemes, is
to partition the uncompressed blocks so that there's a "clean" state machine
at known points throughout the stream.  Typically this consists of
independently compressed blocks of a fixed uncompressed size, with some sort
of periodic index of the blocks (such as a N-block table between each N data
blocks, or a pointer to the next block at the start of each, or similar).
Obviously, the compression achieved here is far less than you'd get from a
complete-stream gzip pass, but it doesn't have nearly the requirements of
changing horses^W^Wrestarting gzip decompression in the middle of a stream.

The former case is a candidate for a LD_PRELOAD overlay of libc, and would
never belong in kernel space at all, though it has the highest degree of
compatibility (given that it would handle plain gzipped files).  The latter
case lends itself much better to a layered filesystem, and could be
kernelized depending on the memory requirements of the compression scheme
used (LZO, for instance, is very low on memory consumption).

Oh, I haven't even addressed writing here, yet.  All this is read-only.  8-)

-- 
-- Todd Vierling <tv@pobox.com>