Subject: Re: Compressed filesystem (was Re: CryptoGraphic Disk.)
To: Daniel Carosone <dan@geek.com.au>
From: Todd Vierling <tv@pobox.com>
List: tech-security
Date: 10/10/2002 17:17:02
On Fri, 11 Oct 2002, Daniel Carosone wrote:

: How big is the state machine?

For gzip streams, the LZ window size (IIRC, 256 bytes for -z1 through 32KB
for -z9), plus Huffman dictionary size (data-dependent, but typically
small, <1KB I believe).  The LZ window makes gzip's state machine rather
expensive to save and restore if the compression level is anywhere above -z4
or so.

For partitioned uncompressed blocks (the other alternative), zero.  The
state machine is assumed to be empty at the partition points, so you get a
clean slate.  (The point of the compressed-block indexes is to allow seeking
in either direction very quickly in order to find the appropriate point in
the uncompressed stream.)

: You could combine the ideas, at least for read-only media prepared
: for the purpose.  Save the state machine at each of the partition
: points, rather than resetting it.  This could go in an adjunct
: helper file, so the actual files remain standard gzip format.

Note that the LZ window is simply a sliding window of the *uncompressed
data*, used for string repetition, which is what LZ compresses.  So saving
that to an adjunct file or embedded into the stream might not be
particularly useful unless it's spaced very far apart (1MB or more) -- and
in that case, you'll still want to do the in-memory stashing during a
decompress, so you can handle things like seeking forward to the 4.5MB mark
and then backwards 0.3MB efficiently.

Mind you, excessive forwards and backwards seeking may seem implausible for
everyday applications, but it's not so implausible in the case of UBC-based
accesses of pages.  8-)

-- 
-- Todd Vierling <tv@pobox.com>