Subject: pkg tarball proposal (was RE: automatic package statistics)
To: None <feyrer@rfhs8012.fh-regensburg.de, root@garbled.net>
From: Ross Harvey <ross@ghs.com>
List: tech-pkg
Date: 10/14/1999 15:06:10
> From: Hubert Feyrer <feyrer@rfhs8012.fh-regensburg.de>
> On Thu, 14 Oct 1999, Tim Rightnour wrote:
> > I think the right thing to do is write a perl script that looks at the logs for
> > ftp.netbsd.org, and records how many times that distfile is downloaded, or
> > attempted to be downloaed. (if that one is in the logs, not sure).
>
> This is very very true - the ftp server's logs should be really used for
> some statistics. Maybe I should find time to continue that project ...
>
>
> > What I would not object to, is logic in bsd.pkg.mk, that sends out information
> > like the above, but must be explicitly turned on by the user, and ships with a
> > NO in mk.conf.example, with a comment urging people to turn it on (with proper
> > explanations of what it does).
>
> Well, given all the reasons stated here by many people, I'd prefer not to
> see such code for now.

[ Note, if this is too long for you, please skip directly down to the
proposal: ``== Tarballs for Pkgs and Distfiles ===''. ]

This whole thing brings up a problem I keep having: we need to have a fast
way to identify what to put on CD's. Note that the CD recorder is likely to
be under extreme time pressure, so he won't be able to do anything fancy.

The basic problem is that the release mechanism for packages doesn't
address distribution at all. Only for *source* and for the cdrom-readme
target do we deal with distribution.

There are a lot of issues. What's more important than an exact balance
is that some interim solution be adopted. Various considerations are:

   *  licenses

   *  distfiles vs binaries

   *  package popularity

     +  always distribute the most popular packages -- OR --

     +  rotate the distributed packages so someone buying a series of
        sequential CD distributions get an increasing selection

   *  lots of small pkgs vs a few large ones

   *  special netbsd-packages disks? (see below re: space problems)

   *  expanded source and pkgsrc?  and still include the source tarballs?
      (a pkgs issue, because it uses most of the space that would have
      been available for pkgs)

The base netbsd distribution now takes up more than one CD. For 1.4.1, I
was "lucky" and a couple of our zillion ports didn't get built, so for that
reason alone I was able to make disk 1 exportable by moving the secr.tgz
files and source tarballs to disk 2.

However, in the past strong arguments were made that we should ship an
expanded source tree, and it's traditional on our disks to ship expanded
pkgsrc as well. However, both source tarballs and expanded source now end
up combined on disk 2.

As you can imagine, when THAT is done there is not so much space left on
disk 2. As a result, I've decided so far not to ship any compiled pkg
binaries, but I used the space available just for a few distfiles.

Now, a one-architecture CD has a similar decision. If you unpack source
and pkgsrc, your initially healthy space for distfiles and compiled pkgs
vanishes quickly.

I would propose, at the vague, concept-level, hand-waving stage ...

		== Tarballs for Pkgs and Distfiles ===

	This would be an analog of src/distrib/sets but for pkgsrc instead
	of src. it would package up in discrete tarballs collections of
	distfiles and in .bz2 tarballs collections of compiled binaries.

	Possibly this would depend on a priority number given to each pkg.
	Everything of priority 1 and its dependencies would go into tarball
	#1, everything of priority 2 and what it depends on (and which
	isn't already in #1) goes to 2, ...

	The idea is that these can be downloaded by the CD recording engineer
	until his disk is full, at which point I suppose they are unpackged
	before recording. (Or, teach pkgsrc how to extract distfiles and
	binaries from the tarballs.)

	The decision to rotate or prioritize tarballs is a policy one and
	this mechanism kind of supports either.

There are other approaches. This could all be indirected, with the mechanism
making only a series of lists of distfiles and a series of lists of compiled
pkgs. In this case, it should be able to download the expansion of those
lists and also back off and remove the expansion of one. (BTW, one cool
subproposal would be a target for installing a binary pkg from the Makefile
in the pkgsrc subdirectory. The tree does organize the packages nicely and
this would eliminate the need to type the complex and non-obvious
sometimes-version-full binary package name.)

Note that saying "go to three disks" is orthogonal to this problem. We
would still need a selection mechanism. Now, a special packages-CD would
be good.  It might not need to be respun with the point releases, for
example, and it would not suffer from the source tarballs and source trees
hogging space.  (If we continue to provide them, that is.) If pkgs were on
a separate CD, then disk #2 could be made smaller. This has certain advantages
in that it comes in two flavors: exportable and non.  As a result, making
it smaller makes it easier to do CD-R runs for it, whereas disk #1 is for
everyone and is reasonably pressed. You can see that the distribution
mechanism should be able to work with a wide range of available megabytes.
(And DVD distributions will eventually happen.)

	ross.harvey@computer.org