Subject: Re: sets compression
To: Daniel Carosone <dan@geek.com.au>
From: Simon Burge <simonb@wasabisystems.com>
List: current-users
Date: 11/07/2005 12:06:16
Daniel Carosone wrote:

> On Sun, Nov 06, 2005 at 12:55:17PM -0800, John Nemeth wrote:
> > I'm wondering if we should switch to bzip2 for compressing sets in
> > order to reduce image size?
> 
> Or rzip (which we c/should import in that case), which uses the bz2
> library and adds some stuff to do much longer-range compression
> sorting.

rzip can't decompress to stdout, making it harder to use for extracting
sets.  We'd need to have separate "decompress the set" and "extract the
set" stages.

It also uses _much_ more memory on compression - an otherwise idle box
here with 256MB of RAM started swapping when trying to rzip a recent
i386 comp set.  After 8 minutes it had made a 50 byte output file, while
a similar speed box with 2GB of RAM took 45 seconds to rzip the same
file.

rzip however does produce a smaller compressed file in this case:

15104 -rw-r--r--  1 simonb  simonb  - 15451684 Nov  7 11:57 comp.tar.rz
18680 -rw-r--r--  1 simonb  simonb  - 19097153 Nov  7 11:56 comp.tar.bz2
23016 -rw-r--r--  1 simonb  simonb  - 23542910 Oct 24 23:00 comp.tar.gz
82992 -rw-r--r--  1 simonb  simonb  - 84920320 Nov  7 11:55 comp.tar

Decompression times and memory usage (rss column from ps) are:

	gzip	 1.2sec, max memory  740k (using zcat)
	bzip2	10.8sec, max memory 4276k (using zcat)
	rzip	13.9sec, max memory 6524k

So bzip2 and rzip have their space advantages, but gzip is much faster
and uses a whole lot less memory.  There's still a large class of
machines where these are issues...

Personally, I think we should stick with gzip for sets, but maybe we
have an option for using bzip2 so people who want to use it locally can?

Simon.
--
Simon Burge                            <simonb@wasabisystems.com>
NetBSD Support and Service:         http://www.wasabisystems.com/