Subject: Re: sets compression
To: Simon Burge <simonb@wasabisystems.com>
From: Daniel Carosone <dan@geek.com.au>
List: current-users
Date: 11/07/2005 12:30:23
--UIrAl4r1g2eOkvhC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Nov 07, 2005 at 12:06:16PM +1100, Simon Burge wrote:
> rzip can't decompress to stdout, making it harder to use for extracting
> sets.  We'd need to have separate "decompress the set" and "extract the
> set" stages.

Huh. I knew about the compress stage, I must admit I hadn't gotten
around to noticing/discovering-the-hard-way that this applied to
decompression, too.  I've been using it to compress .iso files so far,
and this hadn't been an issue.  Bummer (for this usage).

I agree that we need streaming decompression for installs, so never
mind about my suggestion.

> It also uses _much_ more memory on compression - an otherwise idle box
> here with 256MB of RAM started swapping when trying to rzip a recent
> i386 comp set.  After 8 minutes it had made a 50 byte output file, while
> a similar speed box with 2GB of RAM took 45 seconds to rzip the same
> file.

It uses more memory, certainly, but don't read too much into the
initial data production rate.  It's quite variable, and seems to go in
several stages.  You probably hadn't gotten past the initial
rsync-like mapping.

It also seems to be heavily disk/seek bound rather than cpu bound, for
the bulk of the compression work once the initial mapping is done.
Unsurprisingly, it would appear to be selectively reading a big memory
buffer of blocks from all over the file, and then compressing that
quite quickly and writing it out before reading again.

This makes for some new and different tradeoffs with respect to
multiple jobs or other processes on a machine.

> rzip however does produce a smaller compressed file in this case:
>=20
> 15104 -rw-r--r--  1 simonb  simonb  - 15451684 Nov  7 11:57 comp.tar.rz
> 18680 -rw-r--r--  1 simonb  simonb  - 19097153 Nov  7 11:56 comp.tar.bz2
> 23016 -rw-r--r--  1 simonb  simonb  - 23542910 Oct 24 23:00 comp.tar.gz
> 82992 -rw-r--r--  1 simonb  simonb  - 84920320 Nov  7 11:55 comp.tar

Yes.  I've seen considerably better results than this, for some files,
too.  It certainly gives size benefits in return for its other
constraints.

> Personally, I think we should stick with gzip for sets, but maybe we
> have an option for using bzip2 so people who want to use it locally can?

That would be good.

--
Dan.
--UIrAl4r1g2eOkvhC
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (NetBSD)

iD8DBQFDbq4vEAVxvV4N66cRAomDAKDsAi2aoYbfcuv+QJk8xr006WsNtACdH098
iWd5FGPEHIVkUaskrwgGTJ8=
=ZS4o
-----END PGP SIGNATURE-----

--UIrAl4r1g2eOkvhC--