Subject: Re: sets compression
To: Simon Burge <firstname.lastname@example.org>
From: Daniel Carosone <email@example.com>
Date: 11/07/2005 12:30:23
Content-Type: text/plain; charset=us-ascii
On Mon, Nov 07, 2005 at 12:06:16PM +1100, Simon Burge wrote:
> rzip can't decompress to stdout, making it harder to use for extracting
> sets. We'd need to have separate "decompress the set" and "extract the
> set" stages.
Huh. I knew about the compress stage, I must admit I hadn't gotten
around to noticing/discovering-the-hard-way that this applied to
decompression, too. I've been using it to compress .iso files so far,
and this hadn't been an issue. Bummer (for this usage).
I agree that we need streaming decompression for installs, so never
mind about my suggestion.
> It also uses _much_ more memory on compression - an otherwise idle box
> here with 256MB of RAM started swapping when trying to rzip a recent
> i386 comp set. After 8 minutes it had made a 50 byte output file, while
> a similar speed box with 2GB of RAM took 45 seconds to rzip the same
It uses more memory, certainly, but don't read too much into the
initial data production rate. It's quite variable, and seems to go in
several stages. You probably hadn't gotten past the initial
It also seems to be heavily disk/seek bound rather than cpu bound, for
the bulk of the compression work once the initial mapping is done.
Unsurprisingly, it would appear to be selectively reading a big memory
buffer of blocks from all over the file, and then compressing that
quite quickly and writing it out before reading again.
This makes for some new and different tradeoffs with respect to
multiple jobs or other processes on a machine.
> rzip however does produce a smaller compressed file in this case:
> 15104 -rw-r--r-- 1 simonb simonb - 15451684 Nov 7 11:57 comp.tar.rz
> 18680 -rw-r--r-- 1 simonb simonb - 19097153 Nov 7 11:56 comp.tar.bz2
> 23016 -rw-r--r-- 1 simonb simonb - 23542910 Oct 24 23:00 comp.tar.gz
> 82992 -rw-r--r-- 1 simonb simonb - 84920320 Nov 7 11:55 comp.tar
Yes. I've seen considerably better results than this, for some files,
too. It certainly gives size benefits in return for its other
> Personally, I think we should stick with gzip for sets, but maybe we
> have an option for using bzip2 so people who want to use it locally can?
That would be good.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (NetBSD)
-----END PGP SIGNATURE-----