Subject: Cheesy compression/decompression in filesystem namespace (was Re: New read & write syscalls)
To: Bill Studenmund <wrstuden@nas.nasa.gov>
From: Brian C. Grayson <bgrayson@marvin.ece.utexas.edu>
List: tech-kern
Date: 07/02/1999 03:57:43
On Wed, Jun 30, 1999 at 05:38:58PM -0700, Bill Studenmund wrote:
> On Thu, 1 Jul 1999, Hubert Feyrer wrote:
> 
> > On Wed, 30 Jun 1999, Bill Studenmund wrote:
> > > Thoughts?
> > 
> > Will you add a compressing filesystem you mentioned? :)
> 
> While the idea's nice, we need to work on layered fs's caching data first.
> I asked the Chucks about it, and it's an after-UBC thing.

  Late night ramblings, for those interested....

  A few months ago, I hacked mount_portal to provide an
automatic filtering capability, suitable for
compression/decompression.  Some example uses, where the portal
is mounted on /p:

wc /p/http/www.netbsd.org

tar cf /p/bzip2//tmp/bzip2tarfile.tar.bz2 ...  (creates a bzip2'd
    tar file, without having to teach tar how to bzip2 things)

tar xf /p/bunzip2//tmp/bzip2tarfile.tar.bz2

gv /p/bunzip2/`pwd`/thisfile.ps.bz2

tar xf /p/bunzip2//p/http/lore.ece.utexas.edu/~bgrayson/xosview/xosview-1.7.1.tar.bz2 (double-portal use!)

  You get the picture.  It doesn't cache, because I don't have
the skillset to write a caching layer ("cachefs").  But it's a cool
proof-of-concept.

  In general, you specify the following in the portal.conf file
for my filtering mods:

1.  match key -- pathname under mount point to be recognized
2.  "rfilter" or "wfilter" -- depending on whether this is a readable
	or writeable pipe/pseudofilter.
3.  strip key -- the mount point and the strip key are stripped
    from the pathname, and the result is remembered, to be used
    below.  
4.  command to run
5.  optional arguments to the command.

  mount_portal will run popen("cmd args remembered-component") and
return the socket to the application for reads and writes.  Or at
least that is my meager understanding of the underlying goo.

  Examples:

#Match		type	stripkey	cmd & args
bzip2/		wfilter	bzip2/		bzip2 -9 >
bzip2://	wfilter	bzip2://	bzip2 -9 >
bunzip2/	rfilter bunzip2/	bunzip2
http/		rfilter	http/		lynx -dump
http://		rfilter	http://		lynx -dump

  (Actually, lynx -dump isn't 8-bit clean, so the tar example
above won't work as described.  I have a small module
that does a direct GET as part of the http portion of
mount_portal, rather than using lynx as an rfilter, so in reality
it does work.)

  I'm sure this might be as insecure as all blazes, but it's a
cool hack, IMHO!

  One cool application of this would be, the NetBSD FTP servers
have both a large set file, and the broken-down set pieces (aa,
ab, ac, etc).  With an appropriate rfilter, it could be set up
so that ftp.netbsd.org/pub/NetBSD/NetBSD-1.4/i386/binary/sets/base.tgz
is a symlink to, say,
/p/pieces2set//pub/NetBSD/NetBSD-1.4/i386/binary/sets/Split/base.,
where the pieces2set tool knows to take all files that match
/pub/NetBSD/NetBSD-1.4/i386/binary/sets/Split/base.*, cat them
together, and return that to the ftp daemon.
Presto!  The NetBSD ftp server now cut down its space
requirements by a factor of around two!  We can now keep more
releases on the FTP server at the same time, like 1.3, 1.3.1,
1.3.2, 1.3.3, 1.4, 1.4.1, and 1.5.

  Anyway, off to bed.

  Brian