Subject: Re: Question on pipes (off-topic)
To: John Maier <jmaier@midamerica.net>
From: Kevin Cousins <kevin.cousins@praxa.com.au>
List: netbsd-users
Date: 08/29/2000 15:57:49
    Todd> tmpfile=/tmp/whoisdata.$$

    Mason> Or, perhaps better, since NetBSD supplies this cool
    Mason> functionality already:

    Mason>     tmpfile=`mktemp /tmp/whoisdata.XXXXX`
...
    Mason>     rm $tmpfile

Off topic, I know, but FIFOs can be lot of fun (c.f. mkfifo(1)).

When faced with the prospect of dealing with prohibitively 'ken huge
(even when compressed) datasets, and just barely enough real storage
for compressed input and output, yet with significant preconditioning
and postconditioning to be done, I have been known to shuffle data
between processes using lengthy pipelines using tee(1) and several
FIFOs.

  mkfifo input.data intermediate-result.1 # ...

  gunzip -c input.data.gz | tee input-data |
      ( first-processing-pipeline ) > intermediate-result.1 &

  cat input-data | 
      ( second-processing-pipeline ) > intermediate-result.2 &

  # ...

  cat intermediate-result.1 |
      ( last-processing-pipeline ) | gzip > output.data.gz &

Using this sort of approach, I was able to variously parse ~8 million
records (each 1K long), in several passes, in less than 15 minutes on
a 500MHz Alpha!  The compressed input was ~100MB.



Still, this might be overkill for John's situation.  I like Mason's
way!

--
Kevin.