Subject: Handling of wildcard dependencies over FTP [long]
To: None <tech-pkg@netbsd.org>
From: Hubert Feyrer <feyrer@rfhs8012.fh-regensburg.de>
List: tech-pkg
Date: 02/28/1999 05:00:56
	Abstract: This document first recalls how dependencies work
	in the NetBSD Packages System, outlines a way for supporting
	to install binary packages that contain wildcard dependencies
	via FTP and at what state this is. 


0) Let me first remind you how dependency handling works right now

Leaving aside RUN/BUILD_DEPENDS, the our current dependency scheme works
by specifying lines in pkgs Makefiles like:

	DEPENDS+=	foo-1.2:../../somecat/foo

This specifies two things:
1. The version of "foo" this package depends on. If this package is to
   be installed, foo-1.2 needs to be present. 
2. A fallback used if the required version is not installed on the system,
   to build it via the packages system.

Upon installation, if the required version of "foo" is installed,
everything is fine and installation proceeds, for both building from
pkgsrc and installation of binary packges via pkg_add.

If the required version is not installed, the build system will use the
given fallback directory to install the package available at that place,
in the hopt to fulfil the dependency. For installatin via binary packages,
pkg_add will assume there's a foo-1.2.tgz binary package out there and
install it. 

The hardcoding of the version of "foo" wanted is problematic to maintain,
and most of the time, it's not even a fixed version that's needed, but
some or even any version installed would do, resulting in a dependency
setting such as

	DEPENDS+=	foo-*:../../somecat/foo

This indicates that "any" version of "foo" would do. The build system
looks up if there's any "foo" installed by calling "pkg_info -e foo-*",
scanning all installed packages and accepting whatever version of "foo" is
installed IF it is installed. If it's not installed, the build system will
take whatever version happens to be in pkgsrc and installs it, using the
fallback given.

For binary packages, the handling's more complex. Any binary package will
properly know that it depends on "any" version of the "foo" package
installed, and does the same check as the build system to find out. If any
acceptable version is installed, fine. If not, we're a little bin in
trouble to fulfil the requirement to (automatically) install "any" version
of the "foo" package, as the "fallback" directory given can't be used in
the context of binary packages, as rebuilding from source is not an option
there. 

Instead, pkg_out goes out, scans all the available (binary!) packages
available, and will then install the most recent one (still meeting the
"requirement criteria, "foo-*"). This scanning is necessary to support the
wildcard notion. If the version depended on uses some version limitation
(``dewey depend''), e.g. foo<1.0, then the latest binary package available
below version 1.0 will be found and installed, thus fulfilling the
required dependency.

Code for this is in pkg_add for several months now, with (only) one part
missing:

Binary packages can be installed not only from local disk but also via
FTP, and the "directory scanning" described above is not implemented for
FTP, only for package installation via local packages. 


1) Alternatives

There was some discussion whether to include a (fixed) fallback version
into binary packages to avoid the directory scanning and insist of that
fixed version being present on the FTP server for download. This solution
has two major consequences:

 - losing the flexibility of wildcards 
 - different behaviour for one part of the dependency system

Implementing fixed fallback versions will replace the wildcard
dependencies (used for everything but FTP installs) back to depending on
fixed version numbers. This introduces inconsistency in both the semantics
of how the dependency system works in other areas (determining if the
wanted package is already installed, installation from local disk).  
Furthermore, we could easily run into the need to keep several versions of
a package available if several binary packages were build at different
times (i.e. one has -1.0 as fallback, the next has -1.1, ...).

The remaining argument is that (for the case of DEPENDS=foo-*:...) if a
future version of the "foo" package shows up on the FTP site that's
incompatible with the package to be installed, the incompatible version
will still be installed. This is valid, but assuming that if such a 
version inconsistency is detected, the package in question will be updated
to contain the proper (dewey) depends, this updated package will show up
at the same time on the FTP site as well, getting things back into sync
immediately. 

Implementing fallback versions, binary packages need to be updated
whenever a package depended on changes version, whereas with the scanning
outlined above, this updating of binary packages is only necessary for the
depending package if some incompatible version shows up in the future. 

All in all, the "directory scanning" approach provides more flexibility
and less maintainance effort. 


2) Implementation

2.1) pkg_*

The plan is to change the internal interfaces of the pkg_* tools in all
the needed places to accept not only local filenames, but also URLs, doing
proper action on the latter, if needed. The code to do the connection
establishing, file retrieval, etc. is implemented by forking off a ftp(1)
coprocess, that is then remote-controlled and fed with the appropriate
commands.

Functions affected are ftpGetURL(), fileGetURL(), fileFindByPath(),
fileGetContents(), isfile(), etc.


2.2) Connectin caching

Imagine package "a-1.0" depending on package "b-1.*", which needs "c-*",
etc. - something like "kde". Now, the (net-)actions needed are (roughly):

 - grab +CONTENTS of "a-1.0"
 - find out which versions of "b" are available
 - grab +CONTENTS of "b-1.whateverisavailable"
 - find out which versions of "c" are available
 - grab +CONTENTS of "c-something"
 - grab rest of "c-something"
 - grab rest of "b-1.whateverisavailable"
 - grab rest of "a-1.0"

Even if the "grab rest of ..." operation can be implemented by re-using
the same connection (FTP or whatever), any new package is still be added
by a new pkg_add process, which would potentially open another connection
to the same FTP site, resulting in

 - flooding the remote site with connects
 - wasting time for extra connection estrablishment

This can be avoided by using the same FTP session over all three pkg_add
sessions. The process running ftp(1) has two pipes open for stdin and
stdout, which are passed down to subsequent pkg_add commands, which know
the file descriptors from some environment variables. 


3) State

Code exists to do the connection caching over the various pkg_add
processes, as well as procedures to implement the needed functions for
scanning etc., including checking for any errors returned by the ftp(1)
process. 

Next is to rework this into functions that can then be included either
directly into the various functions in pkg_* (see 2.1), or via some table
indexed by access method (local disk, ftp, ...).

Work on this has stopped due to real-live kicking in hard, I expect to get
back at this by end of march, leaving this out for 1.4. 


4) Impact

When this code is (finally) in, we can use wildcards in DEPENDS.


Stay tuned,

	   Hubert
-- 
Hubert Feyrer <hubert.feyrer@rz.uni-regensburg.de>