Subject: Re: GNU tar goodbye?
To: NetBSD Userlevel Technical Discussion List <tech-userlevel@NetBSD.ORG>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-userlevel
Date: 10/11/2002 13:31:27
On Fri, 11 Oct 2002, Greg A. Woods wrote:

> [ On Thursday, October 10, 2002 at 15:56:07 (-0700), Bill Studenmund wrote: ]
> > Subject: Re: GNU tar goodbye?
> >
> > This comment reflects a lack of listening to others. If you won't listen
> > to them, why should they listen to you?
>
> If you've got facts and technical explanations to show that
> '--fast-read' is somehow critical I'm sure we'd all like to hear them.
> I'm quite curious as to what such reasons could be myself.

Your tone above does little to encourage discourse. Your tone indicates
that you have decided '--fast-read' is not important, and that that's
that.

It doesn't matter why people want it, they want it. We have it now. To
switch tar -> pax-as-tar w/o --fast-read will be a step backwards for a
number of our users.

> Technically speaking though it seems pretty clear with even minimal
> analysis that '--fast-read' is simply an unnecessary optimisation.  No
> functionality is lost without it, and it could even safely be
> implemented as a no-op.  I don't have it in my implementation and I
> haven't encountered any problems whatsoever.  OpenBSD doesn't have it in
> theirs and I've not found any online reference to any problems over
> there either.
>
> I.e. any claim that '--fast-read' is a necessary feature before pax can
> successfully replace GNU Tar seems to be, technically speaking, wrong.

You'd do poorly in customer relations. The customers have it and they want
it. We should give it to them.

Also, we're talking about a time optimization. By its definition, things
will work w/o it, just slower. Arguing we don't need it because things
work w/o it misses the point.

One of the main reasons we want it is for supporting binary packages. One
of the first things we do is grab the +CONTENTS file. There will be only
one, and it usually is at the front of the file. So it's REALLY SILLY to
read the whole file to find out that there isn't another copy of the file
we knew there was only one copy of.

Here's some times on a fast machine (ftp.netbsd.org, a speedy Athlon XP
box) for getting +CONTENTS. Yes, I chose the fattest binary package I
could find, but it shows the point:

babylon5: {20} time tar -xzf /ftp/pub/NetBSD/packages/1.6/i386/All/openoffice-0.0.0.641nb1.tgz --fast-read +CONTENTS
0.000u 0.006s 0:00.05 0.0%      0+0k 1+6io 0pf+0w
babylon5: {21} time tar -xzf /ftp/pub/NetBSD/packages/1.6/i386/All/openoffice-0.0.0.641nb1.tgz +CONTENTS
4.410u 1.155s 0:07.28 76.3%     0+0k 1+7io 0pf+0w
babylon5: {22} time tar -xzf /ftp/pub/NetBSD/packages/1.6/i386/All/openoffice-0.0.0.641nb1.tgz +CONTENTS
4.368u 1.256s 0:07.93 70.7%     0+0k 0+5io 0pf+0w

Note: I tried timing twice just to see if caching made much difference.

So no --fast-read == about 200x system time increase, and a LOT of user
time.

Yes, I chose the largest file I could on a fast machine. But I think that
would be somewhat indicative of what we might see on a slower machine
with a more-modest binary package.

Take care,

Bill