Subject: (Incomplete) List of pkgsrc Improvements
To: None <tech-pkg@NetBSD.org>
From: Dieter Baron <dillo@danbala.tuwien.ac.at>
List: tech-pkg
Date: 07/31/2007 12:17:46
		(Incomplete) List of pkgsrc Improvements

  This is a list of improvements to lay the foundations for overcoming
the current usability limitations of our binary packages.  Comments,
additions and volunteers for implementing them are welcome.


1. pkg_install improvements

1.1 include common INSTALL snippets

  GOAL: Currently, commonly used INSTALL snippets (like creating
users/groups) are shipped with each binary package.  I would like to
make all these snippets a part of pkg_install, and just note what
needs to be done in the package itself.

  RATIONALE: If we discover a serious flaw in one of these scripts, we
can update the one version in pkg_install, require (or recommend) an
upgrade of pkg_install and all binary packages out there are
automatically fixed.  Also, this reduces the amount of code contained
in a package that is run as root.

  Less importantly, it also saves space on our ftp server.


1.2 make pkg_add use a config file

  GOAL: Currently, some aspects of pkg_add can be controlled via
environment variables (e.g. wether rc.d scripts are placed in
/etc/rc.d).  I would like pkg_add to use a config file for these kind
of settings instead.

  Also, pkg_add should consult the config file for the list of
acceptable licenses.

  RATIONALE: I expect the number of settings a sysadmin can configure
to increase (see also next item), and a config file is the usual way
Unix tools are configured.


1.3 sysadmin control over installation actions

  GOAL: Make which steps pkg_add executes (e.g. create user/group, run
install script) configurable by the sysadmin, via config file (see
above) and possibly via command line switches.  Also, add a way to
(re)run one or more of these steps later.

  RATIONALE: This gives the sysadmin enough control to handle special
situations and to review security implications.  It would also allow
PREFIX to be shared among multiple machines (e.g. via NFS), allowing
the steps that must be run on each machine (create user/group, copy
config files from examples to etc) to be executed on the client
machines easily.


1.4 fix open-ended dependencies

  GOAL: Currently=B8 dependencies are open-ended in that any version
greater than the required minimum version is accepted.  However,
future versions might not be compatible with the version a package was
built against -- the most common cause is a shared library major
version bump.  Thus, we need a way to express that those newer
packages do not satisfy the dependency.  As we do not know which future
versions will be incompatible, we cannot encode that in the dependency
pattern of the ``parent'' package.

  Joerg suggested an elegant way to solve this: in each package, note
the lowest version a package is compatible with; when a shared library
major number is bumped, this is set to the current version.  When
checking wether a package satisfies a dependency, when its lowest
compatible version is higher than the version requested by the
``parent'' package, it is rejected.

  Another way would be to use the PROVIDES and REQUIRED lines from
+BUILD_INFO (which record information about shared libraries included
or used by the package).  However, that is more complicated and does
not handle incompatibilities for reasons other than shared library
major version bumps (like e.g. command line incompatibilities of an
included utility).

  RATIONALE: We need a way to catch mismatches early, so we don't
install an inconsistent set of packages.  This also helps in-place
replacement of package (without deleting all the packages that depend
on it), as we can check beforehand wether all depending packages are
compatible with the new version.


2. binary package creation improvements

2.1 build binary packages completely without root privileges

  GOAL: Build a working binary package without requiring root
privileges.  Joerg's user destdir mode gets us (mostly) there.

  RATIONALE: Currently, bulk builds usually run as root, so malicious
software could compromise the build sandbox and thus subsequently
built binary packages.  This greatly reduces the amount of code run as
root.  For a typical package, no code from the package is ever run as
root.


2.2 build binary packages for multiple options settings

  GOAL: Allow a bulk build to build multiple variants of a package,
e.g. with and without X11 support.  I would suggest a new variable,
PKG_OPTIONS_TAGS, that lists all the variants that will be built by a
full bulk build, and PKG_OPTIONS_TAG.tag that lists the options to use
for each variant.  The variant tag needs to be added to the file name
of the resulting binary package, and pkg_add needs to be taught to
handle those (allowing a user to select which variant to use, possibly
based on what's already installed).

  Dependencies do not get built multiple times (unless they have
options tags themselves).  A package should note which options it
requires a dependency to be built with, and pkg_add / the bulk build
code will pick a suitable one.


  RATIONALE: Currently, the biggest complaint against using the
options framework instead of separate packages (e.g. emacs,
emacs-nox11) is that with options, only one binary package is built
during a bulk build.  However, using the options framework scales
better and is less work to maintain, so would be preferable.


2.3 sub-packages

  GOAL: Build multiple binary packages from one source package, each
containing part of the package (e.g. server and client).  Currently,
we have no framework to deal with this, and each packages is manually
split into multiple packages, often resulting in parts being done
multiple times (at least the extraction and patching, sometimes even
compiling).  This new framework should also support for building a
subset of the supported packages (so source users don't incur the full
build and dependency impact of parts they are not interested in).

  As I see it, the following would be needed:

  - the user can select which sub-packages should be built/installed
  - the package lists which files belong to which sub-package
    (possibly via patterns)
  - the package can optionally disable building the unwanted parts (if
    the build infrastructure allows it)
  - per sub-package dependencies
  - one sub-package can depend on another sub-package

  RATIONALE: This would allow us to split big packages with a big
dependency tree (like gst-plugins) into multiple binary packages with
fine-grained dependencies, with less maintenance overhead than we have
now.

  NOTE: Details of how to achieve this still need to be worked out.
Most important will be ease of maintenance of such packages.


3. improve meta information storage

3.1 use sqlite3 for /var/db=20

  GOAL: Store the information currently spread over the
subdirectories and files in /var/db in an SQLite3 database.

  RATIONALE: It is easier to backup (it's just one file), easier to
update (use SQL) and easier to query (use SQL).  This would greatly
enhance what pkg_info can do while reducing its complexity.


3.2 use sqlite3 for pkg_summary

  GOAL: Use an SQLite3 database for the pkg_summary file describing a
collection of binary packages.

  RATIONALE: It is easier to query, it is easier to update for package
deletion or addition (e.g. during an incremental bulk build, or if
vulnerable packages are removed).  SQLite3 is public domain,
relatively small (about 2mb of source) and has bindings for a lot of
languages, allowing package management tools written in various
languages easily query the database.  A command line utility to query
the database should be included in pkg_install (to allow shell script
access).