Subject: Re: Proposal: unification of distfiles for FreeBSD and NetBSD
To: Michal Pasternak <michal@pasternak.w.lub.pl>
From: Greg A. Woods <woods@weird.com>
List: tech-pkg
Date: 10/01/2003 17:54:56
[ On Wednesday, October 1, 2003 at 16:58:04 (+0000), Michal Pasternak wrote: ]
> Subject: Re: Proposal: unification of distfiles for FreeBSD and NetBSD
>
> I have always seen *BSD family as operating systems, that have everything
> optimized.

Hmmm... yes, well I agree with the first part but I'm not so sure the
second part is also always true, and it's definitely not true of the
family as a whole.  ;-)

The very existance of a *BSD family in the first place, as opposed to
there being one unified BSD, shouts of execess and repetition!  :-)

Perhaps anything the family does as a whole to optimise their collective
resources is a good thing. ;-)

Using a common distfiles directory has certainly proven to be a very
significant optimisation for myself since it saves not only disk space
but also bandwidth and download time.  Someone mentioned using Squid as
a distfiles cache, but I find having a specific purpose NFS shared cache
directory much better (and I already had one anyway to share amongst my
NetBSD build machines ;-).

I don't know where the break even point is for making it worthwhile to
merge distfiles on public mirror sites, but for myself it seems to be
worthwhile even if it only avoids replicating downloads for about 25% of
the packages I build on both platforms (after all the only "cost" to me
is just making an NFS mount and setting of a Makefile macro).  I don't
worry about accidentally removing any older version still needed on
another platform because I'm either going to want to update the other
platform too, or else I can just download it again as I would have done
had I not tried to use a shared cache in the first place.  I also don't
worry about clashing downloads because I can't parallel-task well enough
to cause that to happen, and I only build a relatively small set of the
whole collection of packages.

So ultimately the unification of distfiles mirrors for the *BSD family
should be a useful optimisation, even if it is never done perfectly.
The more co-ordination on policy issues such as choice of preferred
compression formats and sub-directory naming schemes, the better the
optimisation, but even with no co-ordination at all it is still better
than having two or more completely separate distfiles directories which
will inevitably contain many of the same source archives.

I don't think the GPL issues really affect public distfiles mirrors
since strictly they only mirror (i.e. redistribute) sources, not
binaries (though of course the same site may also (re)distribute
binaries in some other section) -- and I agree with Fredrick that it's
best for all involved to always redistribute the original source
archives (and pkgsrc/ports modules with their patches) with every binary
distribution on physical media, regardless of whether the GPL applies or
not.  I don't think that means binaries have to be avoided between
releases though -- only that if they are distributed on CD, DVD, or
whatever then they also be accompanied by the original source archives
and patches.  After all anyone who can fetch a binary by FTP can also
fetch a source archive by FTP.  I think old versions can still be
cleaned out of the distfiles mirrors as old branches of pkgsrc, ports,
etc. are EOL'ed.  It may mean though that mirror sites also distributing
binaries should remove EOL binary releases too, unless they also keep a
historical copy of the old source archives alongside as well.
Historical archives are a whole different concern anyway -- they're not
caches and are not intended to be "efficient" in the first place.

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>