tech-userlevel: Re: pax-as-tar extract to stdout patch

Subject: Re: pax-as-tar extract to stdout patch
To: Greywolf <greywolf@starwolf.com>
From: Greg A. Woods <woods@weird.com>
List: tech-userlevel
Date: 06/17/2003 19:24:08
[ On Tuesday, June 17, 2003 at 15:12:15 (-0700), Greywolf wrote: ]
> Subject: Re: pax-as-tar extract to stdout patch
>
> ...buh?  I wasn't aware that 'tar' was non-POSIX.  Never mind that it's
> slowly dying, but it doesn't seem to be aware of this phenomenon.

Both 'tar' and 'cpio' as command-line interfaces were as good as dead
essentially _before_ POSIX got off the ground -- they've been long dead
ever since!

Don't you remember /usr/group and their initial UNIX standards effort
and the initial freeware implementation by Mark Colburn, sponsored by
USENIX and published in comp.sources.unix (v17i074)?

Even SunOS-4 had 'pax'!  (though of course it also had that 'bar' thing)

"Pax, germanus"

> GAW> I've also been trying to point out the complete silliness of complaining
> GAW> about the lack of ability to extract a file(s) to stdout in 'pax',
> GAW> especially in the case of scripted use, and especially in the case of
> GAW> the NetBSD system build process!
> 
> ...why is it silly, Greg?  Because it interferes with...what, exactly?

Because it's just an extremely silly whine in these circumstances.
Think about it.

About the only common use of extracting to stdout that makes any real
sense is when you've accidentally managed to wrap another tar archive
header around another archive file (ustar or whatever), and the whole
mess is on some kind of media you can't, or don't want to have to, first
copy to local disk, and you actually want to extract the files from the
archived archive:

	tar -O -x -f /dev/rst1 | tar -x -f -

However this ability is restricted to a very tiny number of 'tar'
implementations (two as far as I know, at least until this feature was
proposed for NetBSD's new implementation), so except in special
circumstances (which of course we do have in this one case) it's suicide
to rely on this feature in any script.

Every other case of wanting to extract one file from an archive requires
you to know exactly what pathname the archived file has and thus makes
it trivial, especially in a script, for the programmer to simply extract
the file onto the filesystem (perhaps in a temporary sub-directory if
necessary), and then reference it directly.

Remember that TAR files can contain more than one copy of a file and if
you blindly just extract a file to stdout you could end up reading
multiple copies of that file without any way to tell when the last one
starts (unless the file is itself an archive which you can just continue
extracting and you have to don't worry that the last copy might not have
some enclosed file(s) that the first copy had).  If you want your script
to be safe and reliable you _MUST_ extract the file in the normal way,
processsing the whole archive from start to end, and then refer to the
extracted copy of the file on your local filesystem.  (--fast-read
considered very dangerous by default!)

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>